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ParaGraph: A Graphical Tuning Tool 
for Multiprocessor Systems 


@ Seiichi Aikawa @ Mayumi Kamiko @ Takashi Chikayama 


(Manuscript received November 30, 1992) 


Distributing computational load to many processors is a critical issue for efficient 
program execution on multiprocessor systems. Finding a good load distribution 
algorithm is one of the most important research topics for parallel processing. Tools 
for evaluating load distribution algorithms are very useful for this kind of research. 
This paper describes a system called ParaGraph that gathers periodical statistics of 
the computational and communication load of each processor during program 
execution, in both the higher level of programming language and lower level of 
implementation, and presents them graphically to the user. 


1. Introduction 

In the Japanese Fifth Generation Computer 
Systems Project, parallel inference systems have 
been developed for promoting parallel software 
research and development. The system adopts a 
concurrent logic programming language KL1” as 
the kernel and consists of a parallel inference 
machine, PIM” and its operating system, 
PIMOS”. 

For efficient program execution, the compu- 
tational load must be appropriately distributed 
to each processor. On scalable loosely-coupled 
multiprocessor systems, load balancing and 
minimization of communication overhead are 
essential, but become more difficult compared to 
tightly-coupled systems as communication costs 
increase. Although many load distribution algo- 
rithms have been developed", none have been 
sufficient to execute every program effectively. 
Finding a good load distribution algorithm is one 
of the most important research topics for 
parallel processing. 

Tools for evaluating load distribution 
algorithms are very useful for this kind of 
research. The objective of the ParaGraph system 
is to help programmers design and evaluate load 
distribution algorithms on loosely-coupled multi- 
processor systems. ParaGraph gathers profiling 
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information during program execution on the 
parallel inference machine, PIM, and displays it 
graphically based on the X window system”. 
Many performance displays have been de- 
vised for utilization, communication, and task 
information” ®. For example, graphical meters” 
represent processor-utilization and _ graphical 
animation on a processor configuration map* 
represents interprocessor-communication of mes- 
sage-passing programs. Such specialized views 
provide an intuitive feeling for dynamic behav- 
ior, but it is difficult to determine where the 
performance bottlenecks are. Because the 
execution of parallel programs often raise 
complex phenomena, simple observation of each 
phenomena can not provide full information 
needed to detect performance bottlenecks. For 
example, suppose that when tasks are not 
mutually independent and must communicate 
with each other closely. The program is less 
efficient because of communication overhead. 
But graphical meters may show processors work 
hard, although most of processing time must 
have been consumed on message-handling. In this 
case, it is useful to compare the activity of 
processors with frequencies of sending and 
receiving messages along execution time. Thus, 
bottlenecks are often determined by comparing 
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with some pieces of profiling information each 
other. In ParaGraph system, every kind of 
profiling information can be displayed based on 
three common axes to be easy to compare. 
Because such profiling information can be 
viewed as having three axes: what, when, and 
where. 

In chapter 2, how load distribution can be 
described in KL1 on PIM are described. Chapter 
3 describes the implementation of the ParaGraph 
system and graphical representation of program 
execution, and chapter 4 discusses how useful 
graphical displays are to detect performance 
bottlenecks with examples of various programs. 

The contents of this paper partially overlap 
the subject of a previous paper”. 


2. Load distribution algorithms 
2.1 Load distribution in KL1 
The parallel inference machine runs a 
concurrent logic programming language called 
KL1":*:'", A KL1 program consists of a 
collection of guarded Horn clauses of the form: 
AG 05 Ga | Bisse. Bie Gr 7 a), 
where H, G;, and B; are atomic formulas. H is 
called the head, G;, the guard goals, and B; the 
body goals. The guard part consists of the head 
and the guard goals and the body consists of 
body goals. They are separated by the 
“|”. A collection of 
guarded Horn clauses whose heads have the 


commitment operator 


same predicate symbol P and the same arity N, 
define a procedure P with arity N. This is 
denoted as P/N. 

The guard goals wait for instantiations to 
variables (synchronization) and test them. When 
the guard part of one or more clauses succeed, 
one of those clauses is selected and its body 
goals are called. These body goals communicate 
with each other through their common variables. 
If variables are not ready for testing in the 
guard part because the value has not been 
computed yet, testing is suspended. 

In addition to the above basic mechanism, 
there is a mapping facility which includes load 
distribution specification. The programmer can 
annotate the program by attaching pragmas to 
the body goals to specify a processor {specified 
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next_queen(N,I,J,B,R,D,BL) :- J>0, D=0 | 
BL = {BLO,BL1}, 
R = {RO,R1}, 
BLO = [get (Proc) |BL2], 
try_ext (N, I,J,B,RO,D, BL2) @node (Proc), 
next. queen (N,I,J-1,B,R1,D,BL1). 


processor specification 


Fig. 1— An example of a KL1 program. 


by Goal@node (Proc)}. The programmer must 
tell the KL1 implementation which goals to 
execute on which processors. 

Figure 1 shows a part of a KL1 program. If 
the goal next_queen/7 is committed to this 
clause, its body goals are called. The goal 
try_ext/7 has a processor specification, and it is 
to be executed on processor number “Proc”. 
This processor number can be dynamically 
computed. 


2.2 Design issues 

Load balancing derives maximum perform- 
ance by efficiently utilizing the processing power 
of the entire system. This is done by partitioning 
a program into mutually independent or almost 
independent tasks, and distributing tasks to 
processors. Many load balancing studies have 
been devised, but they are tightly coupled to 
particular applications. Therefore, programmers 
have to build load distribution algorithms for 
their own applications. 

To distribute the computational load 
efficiently, the programmer should keep in mind 
the following points. Since load distribution is 
implemented by using goals, the programmer 
should understand the execution behavior of 
each goal. When goals are executed on a 
loosely-coupled multiprocessor, the programmer 
should investigate the load on individual proces- 
sors and the communication overhead between 
processors. 

For evaluating load distribution algorithms, 
tools must provide many graphic displays for the 
programmer to understand the computational 
and communication load of each processor in 
both the higher program and lower implementa- 
tion levels. No single display and no single 
profiling level can provide the full information 
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needed to detect performance bottlenecks. 


3. System overview 
3.1 Gathering information 

To statistically profile large-scale program 
execution, KL1 implementation provides infor- 
mation gathering facilities, low-level profiling 
and higher-level profiling. KL1 implementation 
provides these facilities as language primitives, 
to minimize the undesirable influence to the 
execution behavior of programs. These facilities 
have been implemented at the firmware level. 
The profiling facilities are summarized as 
follows. 

1) Low-level profiling 

Profiles the low-level behavior of the proces- 
sor, such as how much CPU time went to the 
various basic operations required for program 
execution. 

2) Higher-level profiling 

Profiles the higher-level behavior of the 
processor, such as how many times each piece of 
the program was executed. 

To minimize the perturbation, the gathered 
profiling information resides in each processor’s 
local memory during program execution, and 
after execution, ParaGraph collects this infor- 
mation and converts into some standard form. 
Since profiling information is automatically 
produced by the KL1 implementation, program- 
mers do not have to modify the application 
programs. 

3.1.1 Low-level profiling 

The basic low-level activities can be catego- 
rized into computation, communication, garbage 
collection, and _ idling. Computation means 
normal program execution such as goals’ reduc- 
tions and suspensions, communication means 
sending and receiving inter-processor messages, 
garbage collection means itself, and finally, 
idling means doing nothing. 

The processor profiling facility measures 
how much time went to each category for each 
processor. Such information can be periodically 
gathered to show gradual changes of behavior. 
The profiling facility can also measure frequen- 
cies of sending and receiving various kinds of 
interprocessor messages'’” '”, 
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1) A throw_goal message transfers a KL1 goal 
with a throw goal pragma to a specified 
processor. 

2) A read message requests for some value 
from the remote processor when a clause 
selection condition requires it. 

3) An answer_value message replies to a read 
message when the request value becomes 
available. 

4) A unify message requests body unification 
(giving a value to a variable). 

3.1.2 Higher-level profiling 

KL1 provides a mechanism for grouping 
goals and controlling their execution in a 
meta-level. This mechanism can be considered to 
be an interpreter for the KL1 language. It also 
provides profiling facility at a higher level than 
processor profiling. Low-level profiling gathers a 
number of important statistics from many 
aspects that help analyzing performance bottle- 
necks, but it provides no information on where 
in the program is the root of such a behavior. 

To correlate execution behavior with a 
portion of the program, higher-level profiling 
measures how many times goals associated with 
each predicate are reduced or suspended (due to 
unavailability of data required for reduction). 
Transition of behavior can be observed by 
periodically gathering the information. 


3.2 Graphic displays 

The profiling information can be viewed as 
having three axes: what, when, and where. In 
sequential execution, “where” is a constant and 
the “when” aspect is not important, since the 
execution order is strictly designated. There- 
fore, simple tools like gprof provided with 
UNIX N°") suffice. However, all three axes are 
important when parallel execution is concerned. 

If such massive information is not presented 
carefully, the user might be more confused than 
informed. Therefore, ParaGraph provides 
graphic displays based on three axes. We named 


each representation using the terms “What,” 


Note: The UNIX operating system was developed 
and is licensed by UNIX System Labiratories, 
Inc. 
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Fig. 2—Examples of graphic displays: a What x 
When view (top-left), an overall What x 
Where view (top-right), and a When X 
Where view (bottom-left) and a menu- 
oriented user interface (bottom-left). 


“When,” and “Where.” The term “What” is the 
visualization target corresponding to the type of 
profiling information such as low-level processor 
behavior, higher-level processor behavior, and 
interprocessor message frequencies. The term 
“When” and “Where” indicate time expressed by 
a cycle number and the processor number 
respectively. 

Figure 2 shows the graphic displays of 
ParaGraph. These 
behavior of all solution search program of N 
queen problem. 


displays are execution 


Every type of profiling information can be 
easily displayed with the views described below 
with a menu-oriented user interface such as the 
bottom-right window in Fig. 2. If the window 
size is too small to display everything in detail, 
coarser display aggregating several cycles or 
several processors together is possible to see the 
overall behavior at a glance. Scrolling on the 
vertical and horizontal directions are also 
possible if details are to be examined. It is also 
possible to display only selected “What” items. 

3.2.1 A What X When view 

There are two kinds of views in terms of 
“What” and “When” items. One is a What X 
When view which shows the behavior of each 
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“What” item during execution. A graph is 
displayed of a “What” item in order of the total 
volume. The x axis is the cycle numbers, and the 
y axis is the rate of processor utilization, the 
number of messages, and the number of 
reductions or suspensions corresponding to the 
type of profiling information. Since every graph 
is drawn with the same scale on the vertical 
axis, it is easy to compare with “What” items. 

The other is an overall What X When view 
which shows the behavior of all “What” items 
during execution. Each “What” item is stacked 
in the same graph and displayed by a line. The y 
axis represents the average rate of processor 
utilization, the total number of messages, and 
the total number of reductions and suspensions 
corresponding to the type of profiling inform- 
ation. 

These views are helpful for example, if a 
program has sequential bottlenecks such as tight 
synchronization. In this case, the number of goal 
reductions will be down at some portion during 
program execution. Such a problem will be 
detected easily by observing program execution. 

The top-left window in Fig. 2 shows 
received message frequencies on all processors 
with What X When view. In this window, four 
kinds of receiving message frequencies are 
displayed on each graph. These messages are 
displayed in order of the total number of 
received messages. The other messages are 
displayed by scrolling vertically. 

From this, we know that each received 
message frequency on all processors is less than 
2 500 times/an interval (an interval is 2 second). 
As this program is divided mutually independent 
subtasks, communication message frequency is 
very low. 

3.2.2 A When X Where view 

A When X Where view shows the behaviors 
of all “What” items on each processor. Each 
processor is displayed with various color pat- 
terns that indicate volume. The relationship 
between color patterns and volume are shown in 
the bottom right corner. The brighter the 
pattern, the busier the processor. Volume means 
the rate of processor utilization, the number of 
messages, and the number of reductions or 
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suspensions that correspond to the type of 
profiling information. It’s also possible to display 
only selected “What” items instead of all of 
them. 

The bottom-left window in Fig. 2 is a When 
* Where view. The x axis is the cycle number, 
and the y axis is the processor number. This 
view displays the execution behavior of all goals 
on a 32-processor machine. The color patterns 
indicate the number of reductions. The relation- 
ship between the number of reductions and color 
pattern is displayed on the bottom right corner. 

From this, we know that the work load on 
each processor was well balanced, and this pro- 
gram was executed about 70 000 reductions/an 
interval on each processor at each moment in 
time. 

3.2.3 A What X Where view 

There are two kinds of views in terms of 
“What” and “Where” items. One is a What x 
Where view which shows the load balance of 
each “What” item on each processor. A bar 
chart is displayed of a “What” item in order of 
total volume. The x axis represents the processor 
numbers, the y axis represents the rate of 
processor utilization, the number of messages, 
and the number of reductions or suspensions that 
correspond to the type of the profiling inform- 
ation. All bar charts are drawn with the same 
scale on the vertical axis, so it is easy to 
compare with the volume of each “What” item. 

The other is an overall What X Where view 
which shows the load balances of all “What” 
items on each processor. Each “What” item is 
stacked in the same bar chart and displayed by a 
certain color pattern. The y axis represents the 
average rate of processor utilization, the total 
number of messages, and the number of total 
reductions or suspensions that correspond to the 
type of profiling information. The relationship 
between each category and color pattern is 
displayed on the top-right corner. 

The top-right window in Fig. 2 shows the 
low-level behavior of the processor with an 
overall What X Where view. In this window, each 
categories of low-level behavior is displayed 
with several color pattern. 

From this, the average of computation took 


FUJITSU Sci. Tech. J., 29, 2, (June 1993) 


more than 80 % of total execution time, and the 
average of communication on processor No. 0 
was about 10 %, and the others were less than 
5%. Since processor No. 0 collected answer val- 
ues from the others, it took higher average. 
Thus, this view shows most of the processors run 
fully, and this example program was executed 
very efficiently on each processor. 


4. Examples 

This chapter discusses which views to use to 
view various performance bottlenecks. For 
efficient program execution on multiprocessor 
systems, the following phases are usually 
repeated until a solution is reached: 

1) a program is partitioned into subtasks, 

2) the subtask is mapped to each processor 
dynamically, and 

3) each processor runs subtasks while commu- 
nicating with each other. 

Various problems are often encountered 
when executing a program on multiprocessor 
systems. We will show how graphic displays in 
both the higher program and lower implemen- 
tation levels are helpful with performance 
problems. 


4.1 Uneven partitioning 

When the granularity between subtasks is 
very different, it is useful to observe the low- 
level processor behavior with a When X Where 
view and the higher-level processor behavior 
with a What X Where view. From the When X 
Where view, we will find which processors run 
fully and which are idle. From the What x Where 
view, we will determine which goals caused the 
load imbalances. 

The left window in Fig. 3 shows the low- 
level behaviors on each processor with a When 
x Where view, while the right window in Fig. 3 
shows the higher-level behaviors of the same 
processors with a What * Where view on a 
21-processor machine. An example program is a 
logic design expert system which generates a 
circuit based on a behavior specification. The 
strategy of parallel execution is that first, the 
system divides a behavior specification into 
sub-specifications, next designs subcircuits based 
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Fig. 3— The low-level processor behavior (left) and 
execution behavior of goals (right). 


on the sub-specifications on each processor, and 
finally gathers partial results together and 
combines them. The When X Where view 
suggests that most of processors run almost 
equally, but processors No. 3 and No. 6 run fully, 
and processors No. 0, No. 2, and No. 5 were 
idle. The What X Where indicates the which 
goals were executed on each processor. 

From this, we know that processors No. 3 
and No. 6 were allocated very complicated 
tasks, and processors No. 0, No. 2, and No. 5 
were allocated very tiny tasks, that is, uneven 
partitioning of behavior specification must cause 
a bottleneck in performance. 


4.2 Load imbalance 

If a mapping algorithm has problems such 
as allocating subtasks to the same processor, it 
is useful to observe low-level behavior of the 
processor with a When X Where view and 
higher-level behavior with a What < Where view. 
From the When x Where view, we see which 
processors run fully or which are idle, and from 
the What X Where view, we see the load balance 
of each goal. Using both views, we can 
determine how to distribute the goals that are 
imbalanced to each processor. 

The bottom-left window of Fig. 4 shows 
low-level behavior of the processor with a When 
x Where view, the top-left window and the 
top-right window show the higher-level behavior 
of the processor with an overall What < Where 
view, a What X Where view respectively. An 
example program is a part of the theorem 
prover which evaluates whether an input 
formula is a tautology. The strategy consists of 2 
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Fig. 4— Low-level processor behavior (bottom-left), 
the load balances of all goals (top-left), 
and the load of each goal (top-right). 


steps: 

1) convert an input formula to clause form (i.e. 
conjunctive normal form), 

2) evaluate its clause form and determine 
whether it is a tautology. 

The step 1 is executed in parallel as follows. 
First, main task partitions an input formula into 
subformulas. Second, it generates subtasks to 
convert subclause forms, and finally, distributes 
subtasks to many processors dynamically. These 
steps are repeated recursively until subformulas 
are converted to subclause forms. The step 2 is 
executed in sequential on processor No. 0. 

The When X Where view of the bottom-left 
window suggests that only certain processors 
(processor No. 6-15 and No. 23-31) run fully and 
that the others were mostly idle. The overall 
When X Where view of the top-left window also 
suggests most of the goals were executed on the 
same processors, especially the number of reduc- 
tions of top five goals were higher than the other 
goals. 

We can check the load of each goal on each 
processor from the What x Where view of the 
top-right window. These goals were executed on 
certain processors and were the cause of the 
load imbalances. From this, we have to change 
its mapping algorithm to be flatten the shape, to 
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Fig. 5-The load balances of goals (left) and 
low-level processor behavior (right). 


use all processors efficiently. 


4.3 Large communication overhead 

When subtasks are not mutually independent 
and must communicate with each other closely, 
the program is less efficient because of 
communication overhead. In this case, the 
low-level behavior of the processor with an 
overall What X Where view and frequencies of 
sending and receiving messages with a What x 
Where view are helpful. From the overall What 
< Where view, we will learn how much time has 
been consumed on message handling for each 
processor, while the What X Where view shows 
us what kind of messages each processor has 
sent or received. 

Figure 5 displays an execution behavior of 
an improved version of the program described in 
section 4.2. The left window shows the load 
balances of all goals on a 32-processor machine 
with an overall What X When view. This view 
shows that the work load on each processor was 
balanced in overall execution, but was not 
efficient because of large communication over- 
head. It will be proved from low-level behavior 
of the processor with an overall What * Where 
view shown in the right window. 

Figure 6 shows the same program execution 
as Fig. 5. The left window shows the receiving 
and sending message handling time rate with 
What X Where view, the right window shows the 
frequencies of four received inter-processor mes- 
sages with a What X When view. The right win- 
dow of Fig. 5 suggests the load average on each 
processor was about 80-85 %, but the average of 
computation on each processor was about 20 %. 
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Fig. 6— Low-level processor behavior about mes- 
sage handling (left) and message frequen- 
cies (right). 


Most of the processing power was consumed 
sending and receiving message handling time 
more than 60 % of total execution time. 

The left window of Fig. 6 shows the mes- 
sage handling time on each processor at each 
moment in time was almost equally. The right 
window in Fig. 6 shows that the read message 
was received about 185 000 times, answer_value 
message was about 170 000 times, unify message 
was 100 000 times, and throw_goal message was 
about 66 000 times per interval on all processors. 
The tasks generated in this program commu- 
nicated with each other closely among proces- 
sors as compared with the result of N queen’s 
message frequencies (see the top-left window of 
Fig. 2). 

From this, we know that as work loads are 
distributed more and more, it becomes easier to 
balance work loads on each processor, but 
communication overhead also increases and 
performance is thus lowered. As a result, we 
have to redesign or improve how to divide into 
subtasks. Because the generated subtasks that 
were not mutually independent caused such a 
problem we mentioned above. 


5. Conclusion 

We developed the ParaGraph system on 
parallel inference machines to provide graphic 
displays of processor utilization, interprocessor 
communication, and execution behavior of par- 
allel programs. Experiments with various pro- 
grams have indicated that graphic displays are 
helpful in dividing work loads evenly and deter- 
mining where the bottlenecks are on multi- 
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processor systems. 

We released a version last year as a tuning 
tool of PIMOS, but have experienced some 
problems. In the future, we will improve the 
system considering the following points. First, 
real-time performance visualization tools are 
needed. Although displaying execution behavior 
in real-time perturbs the program being moni- 
tored, it is useful not only in early tuning but 
also in debugging such as detecting deadlock 
status and infinite loops. To develop such a tool, 
low overhead instrumentation techniques and 
new displays that are easy to understand for 
programmers appearing in real-time must be 
devised. 

Second, tools which can visualize the portion 
of the performance bottlenecks directly are 
needed. Massively parallel machines that have 
thousands of processors and programs for long 
runs produce a large amount of profiling 
information, but it is difficult to process or 
display for simple expansion of our system 
because of a vast quantity of information. To 
solve such problems, analysis techniques indi- 
cating bottlenecks directly will be needed. We 
will study automatic analysis techniques and 
graphical displays of its result (we call this 
bottleneck visualization). One such approach is 
critical path analysis’, which identifies the path 
through the program that consumed the most 
time. 
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This paper presents a test program generator called TPGEN, which is based on 
attributed grammar. TPGEN generates a wide variety of test programs mainly for 
programming language processors. The generated test programs are executable and 
have self-checking code for validating execution results. The generated test programs 
are assured that they have specific testing covevage. 

TPGEN simulates the execution of a test program being generated and if an 
abnormal event such as zero divide or infinite loop is detected, TPGEN back-tracks to 
the specified position and selects an alternative production rule to avoid such 
abnormal execution. Introduction of this mechanism has succeeded in generating a 
wide variety of programs with complex structures. 


1. Introduction 

In the past, the formal definition of 
programming languages has been of interest 
mainly for the automatic generation of language 
processors such as compilers, interpreters and 
syntax-directed editors’. There have also been 
studies on its application to automatic genera- 
tion of test cases or test programs”. Automatic 
generation of test programs typically defines test 
grammar in a formal way, such as BNF, and 
generates test programs from this description. It 
is a relatively simple task to randomly generate 
test programs according to a syntax description 
of the language, but the generation of practical 
executable programs requires solutions to sever- 
al problems. 

The first problem is to resolve contextual 
dependencies when generating correct programs. 
Duncan” has resolved this problem using attrib- 
uted grammar and has developed a test program 
generator using a parser generator technique. 
But in our experience, the use of a general 
parser generator technique requires the descrip- 
tion of all the information regarding attributes 
and the passing of attributes and thus results in 
a large and unwieldly description. 

The second problem is in the generation of 
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test programs with self-checking code. Confirma- 
tion of test results requiring a large amount of 
manpower reduces the benefit of automatic 
generation. Reports’ *) show how to describe the 
semantics of language elements and give 
predicted execution results, but include no 
mechanism for automatic checking of execution 
results. D. L. Bird and C. U. Munoz” have 
described the automatic generation of test pro- 
grams which are as executable and self-check- 
able as possible, although there are some 
restrictions on generated program structures. 

The third problem concerns functional 
coverage. To assure adequate coverage we must 
be able to generate executable programs with 
complex structures such as loops. The reports 
that have been published so far describe 
relatively simple cases’ *’. 

Other reports”: '” have pointed out that 
PROLOG is very useful when prototyping a test 
case or test data generator. We implemented 
TPGEN in LISP because of LISP’s facilities 
such as manipulation of pointer variables and 
complex data structures which were necessary to 
make our tool more practical. 

Our test program generator, TPGEN, has 
been in use for software product inspection for 


FUJITSU Sci. Tech. J., 29, 2, pp. 128-136 (June 1993) 


H. Kawata et al.: A Practical Test Program Generator Based on Attributed Grammer 


more than three years. In this paper, we present 
how TPGEN generates executable test programs 
with self-checking code, how it assures testing 
coverage, and how it improves the quality of 
generated programs and some empirical results 
obtained in comparison with conventional meth- 
ods. 


2. Outline of generation principle of TPGEN 

In a syntax-directed definition, each produc- 
tion rule A—> a@ has associated with it a set of 
semantic rules of the form b: = f (cy, Cz, ... Ck), 
where f is a function, b is a synthesized attribute 
of A or an inherited attribute of one of the 
grammar symbols on the right side of the 
production, and c,, C2, .. Cr are attributes 
belonging to the grammar symbols of the 
production. Functions in semantic rules are often 
written as expressions. Occasionally, the only 
purpose of a semantic rule in a syntax-directed 
definition is to create a_ side-effect. Such 
semantic rules are written as procedure calls or 
program segments. They can be thought of as 
rules defining the values of dummy production”. 
An attribute grammar is a_ syntax-directed 
definition in which the functions in semantic 
rules cannot have a side-effect. 

The semantic definition of TPGEN consists 
of two descriptions: one resolves context-depend- 
ency to generate grammatically correct pro- 
grams and the other simulates execution of 
generated programs. In generating a _ proper 
expression, for example, its type is passed to the 
production rule of an expression as an inherited 
attribute. The value attribute is introduced to 
each non-terminal so that the generated program 
can be simulated. Introduction of such attributes 
is not enough to complete the semantic definition 
of TPGEN, which will be explained later. 

We will now explain how TPGEN generates 
test programs from the language definition. 
Figure 1 shows a simplified definition of a small 
subset of FORTRAN (see the appendix for more 
details). In this figure, symbols enclosed by 
< and > mean non-terminals and symbols sur- 
rounded by “ and ” mean terminals. 

If we select production rules in Fig. 1 in the 
order of (1), (2-2), (4), (6-1), (7-1), ..., then parts (a) 
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<program> -> <stmt> & »++(1) 
<stmt> -> <assign-stmt> ! +++ (2-1) 
<if-stmt> & +++ (2-2) 
<assign-stmt> -> <var> "="  <expr> 
(inser t-checking- routine) & = - - - (3-1) 
<if-stmt> -> "IF (" <b-expr> ”) THEN” aa 
"BLSB” <stmt> ”"ENDIR” & + (4) 
<expr> -> <primary> {+++ (5-1) 
<primary> “+” <primary> |! -.. (5-2) 
<primary> "-" <primary> &  -++ (5-3) 
<b-expr> -> <primary> ”,GT.” <primary> ! --- (6-1) 
<primary> ”.LT.” <primary> ! --~ (6-2) 
<primary> ”,EQ.” <primary> & --- (6-3) 
<primary> -> <ref-var> | ++ (7-1) 
<const> & +++ (7-2) 


Fig. 1—A simplified example of language defini- 
tion (syntax only). 


IJK1 = 150 ++ +(c) 
IF ([JK1 .GT. 100) THEN area) 
IJK1 = IJK1 - 60 »+-(a) 
CALL CHECK (1, 90, 1JK1,’ ASSIGN STMT INVALID’ )- - -(b) 
ELSE ++ +(a) 
1JK2 = 30 ++ +(a) 
CALL CHECK (2, 30, 1JK2,’ ASSIGN STMT INVALID’ )- - -(b) 
ENDIF ++ -(a) 
STOP 
END 
a) Example (1) 
IJK1 = 150 csnasi(G) 
IF (IJK1 .GT. 100) THEN += +(a) 
IJK1 = IJK1 - 60 + +(a) 
ELSE +++(a) 
IJK2 = 30 ++ +(a) 
ENDIF = eie\(a) 


CALL CHECK (1, 90, [JK1,’ ASSIGN STMT INVALID’) -- -(b) 
STOP 
END 


b) Example (2) 


Fig. 2— Examples of generated text. 


and (b) of Fig. 2a) will be generated. Part (b) of 
Fig. 2a) is generated based on the description of 
‘insert-checking-routine’, and part (c) is an 
initialization statement which is generated with 
a declarative statement. 

The procedure adopted by TPGEN to 
generate executable test programs with self- 
checking code is as follows: 

1) TPGEN selects production rules randomly 
or considering functional coverage, if 
specified, starting from <program> and 
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generates source text. Usually, we define a 
<program> so that it includes several 
executable statements with several declara- 
tive statements and initialization statements. 
Number of statements included in a <pro- 
gram > is determined randomly within speci- 
fied minimum and maximum integers. 

2) TPGEN generates source text based on the 
selected production rules, and each time a 
production rule is applied, generated text is 
simulated. If an abnormal event such as an 
overflow is detected, some alternative is 
selected. If all alternatives result in abnor- 
mal execution, TPGEN back-tracks to the 
parent production rule of the current 
production rule and continues processing. 
Confirmation of execution results is done by 
generating self-checking code according to 
the ‘insert-checking-routine’. 

3) A source text for a <program> is gene- 
rated by the above procedure. If a generated 
program includes a complex program 
structure such as a loop, there are several 
problems to be resolved, which will be 
explained later. 


2.1 Resolving context-dependency 

If a test program is generated by selecting 
production rules completely at random, variables 
or functions defined in the declaration portion 
will not coincide with those used in the 
execution portion. In order to resolve such 
context-dependency, information concerning de- 
clared variables must be easily retrieved. In 
TPGEN, system functions are available which 
make it easy to store and retrieve information 
concerning declared variables. 

Such information is considered to belong to 
a specific non-terminal ($PROGRAM in the 
appendix). For example, if a variable is declared 
in a declarative statement, its name, data type 
and other information is registered to that 
non-terminal using a system function. If a 
variable is assigned a value by an assignment 
statement, the value of that variable is updated 
using another system function. 

In generating an expression, the specific 
data type is passed to the production rule of an 
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expression as an _ inherited attribute. In 
generating a subscript expression, its range or 
expected value is passed to the rule of an 
expression and if the value of the generated 
expression is not appropriate, we usually specify 
generation for a fixed period of time until it is 
appropriate. A back-tracking mechanism is very 
useful in such a situation. This will be explained 
later. 

The production rule of subroutines is 
invoked from the semantic definition of the 
“CALL” statement, receives the necessary 
information (subroutine name and parameters) 
from it, and generates an appropriate subroutine. 
Normal execution of that subroutine is assured 
for the current “CALL” statement by simulating 
its execution at the time of generation. 

The production rule of the “CALL” state- 
ment includes two alternatives. One is _ to 
generate a “CALL” statement for already 
generated subroutines, and the other is for a new 
subroutine. When a new subroutine is generated, 
to generate it must be stored with its name in a 
global variable so that other “CALL” statements 
for it can be generated later. 

The generation of a subroutine at the time 
of generating a “CALL” statement, however, 
requires placing that subroutine at an appropri- 
ate point inside the generated test program. This 
problem is resolved by separating text genera- 
tion and its arrangement. Syntax definition of 
such a_ production rule simply states the 
arrangement of generated text (syntax elements), 
and the generation of text is done by semantic 
definition (see the appendix). 


2.2 Self-checking code 

Automatic checking of execution results is 
very important in the inspection and testing of 
our software. We have been using checking 
routines for many years, before TPGEN was 
introduced. We have checking routines for each 
type of variable and for each target language. 
The checking routines themselves are coded in 
each target language. In Fig. 2, they receive, as 
parameters, a sequential number to identify 
erroneous text, a simulated value of the variable 
to be checked, the variable to be checked, and 
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error message text. 

A test program generated by TPGEN is 
executable once it is link-edited with the above 
checking routines. 

Although TPGEN understands values of all 
variables, users of TPGEN- must specify the 
position where the self-checking code should be 
inserted, and the way it should be inserted. One 
reason is that TPGEN does not understand the 
structure of the target language. For example, if 
the “THEN” gl og 
consists of one assignment statement, insertion 
of the self-checking code inside the “THEN” 
clause may require, in some languages, grouping 
of these statements. But the current TPGEN 
does not understand the target language to that 


clause of an statement 


extent. 

Insertion of a self-checking routine is done 
for specified variables based on the definition of 
the ‘insert-checking-routine’ as shown in Fig. 1 
or CHECK, which is described in the appendix. 
Figure 1 specifies the self-checking code to be 
inserted at the end of each assignment state- 
ment. In Fig. 2a), such code is inserted in the 
“THEN” clause and “ELSE” The 
“ELSE” clause is not executed in this example, 
but such an insertion is done assuming the 
“ELSE” clause will be executed. Another test 
program designer may specify the insertion of 
checking routines at the end of the “IF” 
statement for all variables whose values are 
“TR” 
statement. In this case, he should know the 


clauses. 


changed during the execution of the 


variables whose values change according to the 


DO I = 1, 1000, 1 
J =J + 100000 = 


"DO” <var> "=" <expr> ”,” <expr> ”,” <expr> 


<stmt> 
—<assign-stmt> | 


<if-stmt> | 7 
<do-stmt> | <var> = 


os, 


<assign-stmt> 


<expr> 2 taal 


“A. “a 

7 N 7 N 
a SS. Fs N\ 
, et Ss 2----— = 


difference between the value of variables on 
entrance to the “IF” statement and the value of 
variables on exit from the “IF” statement and a 
program described in Fig. 2b) is generated. 

If the assignment statements in Fig. 2a) are 
included in a loop, the definition of CHECK in 
the appendix is not enough to generate a correct 
self-checking code. Insertion of a self-checking 
code inside a loop requires to identify repetition 
in addition to the value of the variable at that 
repetition and this information must be included 
in the definition of ‘insert-checking-routine’. 

Validation of the contents of external files is 
done by validation of variables when they 
retrieve a record from that file. 

Test programs generated by TPGEN thus 
have self-checking code, and if a test program is 
executed correctly, such a program is discarded 
and only the information concerning what kind 
of functional test was done is stored in the data- 
base. 


3. Characteristics of TPGEN 

TPGEN generates test programs as descri- 
bed above. also added the 
following features in order to make the quality 
of generated test programs closer to that of 


However, we 


those generated manually. 


3.1 Preventing abnormal execution by back- 
tracking 
In TPGEN, an expression is evaluated each 
time a production rule is applied, and if an 
abnormal event is detected, TPGEN randomly 


DO I = 1, 1000, 1 
IF J .GT. 50 THEN 
=I1+ 100 
ENDIF 


=) back-track = 


(select 
alternatives) 


<stmt> 


Sig 


back-tracking point 


~ 
~~ 


> <if-stmt> 


-~ 
-~ 


OS 


Fig. 3— An example of back-tracking caused by a loop. 
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selects an alternative rule, excluding already 
selected rules. If an abnormal event is detected 
for all such selections, TPGEN back-tracks to its 
parent production rule, and text generation and 
its simulation continue. But this approach has 
some problems. One problem is that even if some 
selection, for example, (5-2) of Fig. 1 results in 
abnormal execution, it may be executed 
normally if it is selected more than twice, 
because a different <primary > will be selected. 

A similar problem exists when such an 
expression is included inside a loop. In a loop 
(see Fig. 3), an assignment statement is gener- 
ated so that no abnormal execution will occur 
for the first repetition. But the “DO” statement 
may cause abnormal execution for that assign- 
ment statement, at a later repetition. If it does, 
TPGEN considers that the “DO” statement, not 
the assignment statement, is executed abnor- 
mally. So, if the back-tracking point is specified 
inside the production rule of the “DO” statement, 
as in Fig. 3, the generation environment is 
resumed to the specified point, and the “DO” 
statement (body of the “DO” statement) is 
generated and simulated for a fixed number of 
times until it is executed normally. If a 
back-tracking point is specified at the top of the 
“DO” statement, generation of the “DO” 
statement itself is repeated for a fixed number of 
times until it is executed normally. If no 
back-tracking point is specified for the “DO” 
statement, some statement other than the “DO” 
statement will be selected as an alternative to 
the current “DO” statement. 

To specify back-tracking points is delicate 
work. Users must make the scope of back- 
tracking as narrow as possible so that a wide 
variety of programs will be generated, and they 
must at the same time, reduce the frequency of 
back-tracking to improve generation efficiency. 

Detection of infinite loops is done by 
counting repetition numbers. Since the introduc- 
tion of “GOTO” statements makes it difficult to 
design test programs with no infinite loops, we 
usually design test programs which include 
“GOTO” statements separately. 
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3.2 Assuring functional coverage of generated 
programs 
The combinations of selecting production 

rules can become enormous, even _ infinite, 
because of the nested or recursive structure of 
the target language. Thus, generation of test 
programs based on a random selection of 
production rules cannot answer such questions 
as; “are the generated test programs enough to 
cover the functionality of the target language?”. 

In addition to random selection of produc- 
tion rules, TPGEN tries to assure the following 
coverage of generated programs: 

Condition-2 (2 level combination): For each 
alternative of each production rule, TPGEN tries 
to assure generation of all combinations of all 
alternatives of non-terminals which are included 
in that rule. For example, consider the 
<if-stmt> in Fig. 1. Each <b-expr> and 
<stmt> of the “THEN” clause, and _ the 
<stmt> of the “ELSE” clause consists of three 
alternatives, so twenty seven combinations 
should be selected for <if-stmt>. 

This metric is based on syntax definition 
only, and it is usually not possible to generate 
test programs so that they satisfy condition-2 for 
all production rules. We adopted condition-2 for 
the following reasons: 

1) As we cannot do complete functional 
testing, the second best approach is to make 
clear what kinds of functional tests are done 
by the generated programs. 

2) Condition-2 above is, in a sense, close to the 
method which is actually used in designing 
test cases manually’ '”. Thus, we can 
expect that the quality of test programs 
generated by TPGEN is close to that of 
those made manually. 


3.3 Other features 

The following additional features have been 
introduced to make TPGEN more practical. 
1) Weights 

A facility to control weights or relative 
frequency of each of the possible alternatives is 
introduced in the reports” ®. If one particular 
type of statement has a high weight, it will 
appear densely in the generated text. In TPGEN, 
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weights are introduced in the following way: 

<stmt>—> W1 <assign-stmt> | 

W2 <if-stmt> & 

where W1/* Wi is the probability of selecting 
<assign-stmt >. 
Here, W1 and W2, may be expressions, and may 
be changed dynamically: 

<stmt>—> 200 <assign-stmt> | 

(E : SELECT : INIT 100 : IF—SELECTED 

(—380)) <if-stmt> & 
The weight of <if-stmt> is read as follows: the 
relative frequency is set to 100 initially, then it is 
decreased by 30, each time <if-stmt> is 
selected. This enables us to change the selection 
frequency of <if-stmt> to zero at the time the 
nesting level of the IF statement reaches the 
maximum allowed by the target language 
processor. 
2) Special terminals for formatting control 

In FORTRAN, each line must start at 
column 7. We usually use indentation for nested 
IF statements. To cope with these matters, 
TPGEN has special terminals for controlling the 
position of generated text. This also improves 
the readability of generated text. 


4. Evaluation of TPGEN 

TPGEN has been used for more than 3 years 
in our quality assurance department for several 
language processors, including FORTRAN, C, 
LISP, PROLOG, Al-oriented shell, sort-merge, 
and COBOL-embedded SQL. At the time of their 
functional enhancement, these products were 
inspected partially, using TPGEN with about 
2 800 production rules and more than 20 million 
LOCs (line of codes) of generated programs. 

1) Applicable range of TPGEN 

TPGEN is effective for generating test 
programs which execute normally. In the case of 
FORTRAN, about 80 % of the normal function- 
al testing can be done using TPGEN. TPGEN is 
not effective for functional testing of special 
functions such as IT’ function and special files 
such as VSAM files. 

In the case of SQL, most of the functional 
testing for data manipulation language (DML) 
could actually be done using TPGEN, but 
TPGEN is not effective for data definition 
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language (DDL). In the case of DML, the test 
program is designed based on a database whose 
structure is predetermined by a test case 
designer, but testing of DDL requires making a 
variety of databases, and it is difficult in our 
current environment to make simple few self- 
checking routines for such a varying data 
structure. 

2) Quality of generated test programs 

The functional coverage of TPGEN is 
basically the same as our conventional method, 
but we found that test programs generated by 
TPGEN have better bug-detection characteris- 
tics. The main reason, we think, is the 
complexity of generated programs. We analyzed 
test programs generated by TPGEN against 
those made by conventional methods for SQL 
test programs and found that the number of 
tokens included in a single SQL statement is 
about 3.5 times more than those made by 
conventional means. We also found that the 
number of phases, predicates, and the depth of 
nested expressions also increased. 

In designing test cases manually, we often 
specify that some testing factor may be optional, 
because we think such a factor is not important 
for such a test case. But this is potentially a big 
problem, and bugs often exist in places where we 
think there are no problems. TPGEN generates 
test programs randomly without any precon- 
ceived ideas. This is the key point of a random 
testing tool. 

3) Efforts required to make test programs 
using TPGEN 

Our experience shows that making test 
programs using TPGEN is five times easier than 
conventional methods. Using TPGEN, most of 
our labors is devoted to designing test cases. The 
simple tedious work of coding the test programs 
is left to TPGEN, and our time can be spent on 
other work such as inspecting the ease of use 
and performance. 

4) Performance 

TPGEN requires a fair amount of CPU time 
and memory. It takes two or three seconds of 
CPU time on Fujitsu’s large computer M-780 to 
generate test programs of about 1 kilo LOCs for 
a programming language which has no loops 
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Shaka Reicha teal o\'a:0 a 'eie)w lacs) ave Riga vin hs tae RE eee be gs eb cine Belo eee eee 
000001 $PROGRAM -> 
000002 STMT “ORNL” 
000003 “STOP” “ORNL” 
000004 “END” #% 
000005 CEZEXP-R STMT :IN-TERM "@RNL") & 
000006 STMT -> 
000007 200 ASSIGN-STMT ! 
000008 100 IF-STMT & 
000009 ASSIGN-STMT -> 
000010 VAR “="" EXPR “@RNL” 
000011 CHECK #% 
000012 CESEXPL VAR) 
000013 CEZEXPL EXPR) 
000014 (MZSEM CF2VAR-SET VAR.V EXPR.V)) 
000015 CESEXPL CHECK VAR.V “ASSIGN STMT INVALID’) & 
000016 IF-STMT -> 
000017 “IF (’ B-EXPR ") THEN’ “@TAB+” 
000018 STMT#1 “@TAB-" 
000019 ELSE "OTAB+" 
000020 STMT#2 "@TAB-" 
000021 “ENDIF” #% 
000022 CESEXPL B-EXPR) 
000023 CIF (NOT B-EXPR.V) (FSEFFECT-PART-CUT) ) 
000024 CESEXPL STMT#1) 
000025 (IF B-EXPR.V (FSEFFECT-PART-CUT)) 
000026 CESEXPL STMT#2) & 
000027 EXPR -> 
000028 PRIMARY ! 
000029 PRIMARY#1 "+" PRIMARY#2 % (F2V-SET (+ PRIMARY#1.V PRIMARY#2.V)) ! 
000030 PRIMARY#1 "-" PRIMARY#2 X% (F2V-SET (- PRIMARY#1.V PRIMARY#2.V)) & 
000031 B-EXPR -> 
000032 PRIMARY#1 .GT." PRIMARY#2 % V-SET (> PRIMARY#1.V PRIMARY#2.V)) ! 
000033 PRIMARY#1 "LT." PRIMARY#2 % (F:V-SET (< PRIMARY#1.V PRIMARY#2.V)) ! 
000034 PRIMARY#1 .EQ.” PRIMARY#2 % (FZV-SET (= PRIMARY#1.V PRIMARY#2.V)) & 
000035 PRIMARY -> 
000036 REF-VAR X% (F2V-SET (FZVARV REF-VAR)) ! 
000037 CONST & 
000038 CHECKCVAR COMMENT) -> 
000039 "CALL CHECK(” ITEM-NO "4" A," BZ" ©") ax 
000040 CESEXPL ITEM-NO) 
000041 (ESPN A CFSVARV VAR)) 
000042 (E:PN B VAR) 
000043 (ESPN C COMMENT) & 
000044 VAR -> 
000045 #P CCAR (F2RANDOM-SELECT (F:ALL-DECL-VAR))) ! 
000046 #P (GENSYM IJK’) CFIDECL PO.V #"INTEGERP) & 
000047 REF-VAR -> #P (CAR (F:RANDOM-SELECT (FIALL-VAR))) & 
000048 CONST -> #P (FSRANDOM 1 200) & 
000049 ITEM-NO -> 
Q00050 A #% 
000051 CIF (NOT ITN) (SETQ ITN 1)) 
000052 CE:PN A ITN) 
000053 CSETQ ITN (1+ ITN)) & 
000054 (SETQ *CVS*x "CITN)) 


Fig. 4— Test program specification written in TPGEN for a small subset of FORTRAN. 


such as SQL, and about 10 seconds for a 
programming language such as FORTRAN, 
where we need heavy testing of loops which 
often require 


back-tracking for — selecting 


alternative production rules. 


5. Conclusion 

The test program generator TPGEN, which 
is based on attributed grammar, has succeeded 
in generating test programs which assure a 
specific testing coverage and have testing quality 
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as good as or better than manually produced 
ones. 

Additional TPGEN in our 
practical work is important. In our quality 


merits of 


assurance work, we sometime find that a 
software product has poor quality. We request 
the development group to take drastic measure 
to correct it. Later, when we receive the revised 
softwave product, we inspect it again. The same 
test set 


loses some capabilities for quality 


assurance in this case. With TPGEN, however, 


FUJITSU Sci. Tech. J., 29, 2, (June 1993) 


H. Kawata et al.: A Practical Test Program Generator Based on Attributed Grammer 


we can generate another test set. Thus we can 
easily check the quality of the new product. 
Testing using TPGEN is so-called ‘black 
box’ testing. We usually need to employ many 
kinds of tests including ‘white box’ testing, for 
software products testing. We have not evalu- 
ated TPGEN from the point of view of ‘white 
box’ testing. And simple definition of syntax and 
semantics of programming language is not 
enough as input to TPGEN, and the current 
TPGEN descriptions 


insertion of checking routines and the specifica- 


requires such as the 
tion of back-tracking positions. 

Authors are grateful to Dr. Tokuda of 
Tokyo Institute of Technology for his helpful 
comments and suggestions on an earlier version 
of this paper. 


6. Appendix 

A detailed definition of a small subset of 
FORTRAN is shown in Fig. 4. The following is 
an explanation of this figure. 
1) The 


initialization are omitted. 


data declaration and its related 

2) The syntax definition is on the left side of 
“%” or “#%” and the semantic definition is 
on the right side of “%” or “#%”. The 
numbers at the top of the syntax definition 
(see the definition of STMT) define the 
relative selection frequency of that rule 
(default is 100). 

3) “@RNL” specifies the column where the 
generated text is placed. “@TAB+” and 
“@TAB-—” 
shift of output 


indicate a carriage return and a 

position by a specified 
number of columns (default is 2) to the right 
or left, respectively. 

4) In the case of “%”, text is generated 
according to the syntax definition, and then 
the semantic definition is evaluated. The 
semantic definition of EXPR (line 29) means 
that the value attribute of the non-terminal 
EXPR should be set to the sum of the value 
attribute of PRIMARY#1 and that of 
PRIMARY#2. “F : V—SET” is a system 
function which evaluates the value of its 

value 


argument and registers it as a 


attribute of left side non-terminal. The “#n” 
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~ 
~~ 


8) 


is a sequential number to identify some 
non-terminal which appears more than twice 
in one syntax definition. If no semantic 
definition is described, as in line 28, the 
value attribute of the left side non-terminal 
is set to that of right side non-terminal. 

In the case of “#%”, the semantic definition 
is evaluated first, and then the text is 
arranged according to the syntax definition 
using text which is generated in the course 
of the evaluation of the semantic definition. 
Such semantic definition includes descrip- 
tions which control expansion of production 
rules. For example, semantic definition of 
$PROGRAM specifies to expand STMT 
randomly more than once (each STMT is 
separated by “@RNL”). 

In our actual 
which 
execution of the generated text must be 
explicitly stated by writing the “M : SEM” 
function as described in line 14, because such 


implementation, semantic 


definition specifies to simulate 


a definition may be executed more than 
once if the generated program includes a 
loop. But such description is omitted in this 
example except line 14. 

Semantic definition of IF—STMT is a little 
bit complicated, because it contains a clause 
which is not executed. Simulation of such a 
not-executed clause is done in the same way 
as an executed clause, but the simulation 
environment (values of generated variables) 
must be resumed, on exit from such a 
not-executed clause, to those which is on 
entrance to the not-executed clause. Saving 
and restoring such simulation environment is 
specified by “F : EFFECT —-PART-—CUT”. 
Line 23 means that if the value of B-EXPR 
is false, then save current simulation 
environment and restore simulation environ- 
ment after having evaluated line 24. 

In the definition of the assignment state- 
ment, the value attribute of the left side 
non-terminal (VAR) is name _ of 
variable and the value attribute of ex- 
pression (EXPR) is the value of that 
expression. On line 14, “F : VAR-—SET” is a 
system function which retrieves the spe- 


some 
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9) 


cified variable (VAR . V) among the infor- 
mation which is stored to $PROGRAM and 
sets its value field by the second argument 
(EXPR . V). Line 15 expands CHECK by 
passing two parameters (value attribute of 
VAR and string constant “ASSIGN STMT 
INVALID”). 

The definition of CHECK is read as follows: 
the CALL statement is expanded according 
to the syntax definition after evaluation of 
the semantic definition, which expands 
ITEM-—NO (a_ sequential number that 
identifies the self-checking code), then A is 
set by the simulated value of VAR, B is set 
by VAR, and C is set by COMMENT. 
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Binary Decision Diagram (BDD) is now widely used in CAD fields, especially in 
formal verification and logic synthesis. In this paper, variable ordering methods of 
BDD for the application of multi-level logic minimization are presented. The variable 
ordering algorithm for sum-of-products representation is based on cover patterns 


and selects most binate variables first, 


and the one for multi-level logic 


representation is based on depth first traversal of circuits. In both cases, the 
obtained variable orderings are optimized by exchanging a variable with its neighbor 
in the ordering. Experimental results show the effectiveness of our methods. 


1. Introduction 

In logic synthesis, multi-level logic minimi- 
zation plays a very important role in order to 
increase the quality of synthesized circuits in 
terms of area and testability. There have been 
many efforts in developing effective and efficient 
multi-level logic minimization methods, and 
several logic synthesis systems which include 
multi-level logic minimization have been devel- 
oped'~*. In all of them, the key point of 
multi-level minimization is the use of don’t care 
sets; 1e., people have been paying lots of 
attention to how to effectively use don’t care 
sets and how to keep the size of don’t care sets 
manageable. We have developed a multi-level 
logic minimization program’ based on_ the 
transduction method’ using Binary Decision 
Diagram (BDD)” as an internal representation of 
logic functions. BDD is a canonical repre- 
sentation of logic functions. BDD has obtained 
much attention, since it can represent practical 
logic functions like the ones used in ALUs much 
more compactly than other representations, such 
as sum-of-products representation. Much larger 
circuits can be minimized using BDD compared 
with the original transduction method’ which 
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uses truth tables to represent logic and _ per- 
missible functions. We also developed a Boolean 
resubstitution algorithm with permissible func- 
tions”, which can be considered as an extension 
of the transduction method. Permissible func- 
tions are defined on each gate and express don’t 
care sets which do not change the values of 
primary outputs. We used BDD to represent 
permissible functions compactly and get equal or 
superior performance compared with other 
multilevel logic minimization programs, such as 
MIS and BOLD, especially for large circuits. 

The performance of our synthesis method, 
however, highly depends on sizes of BDDs. Sizes 
of BDDs greatly depend on the variable 
orderings used, especially for large circuits. In 
this paper, we present methods to find good 
variable orderings for BDDs with application to 
logic synthesis in mind. The problem of finding 
the best variable ordering is NP-hard®, and a 
couple of heuristics for good variable ordering 
1 In Refs. 9 and 10 variable 
ordering methods based on network topology 


were proposed ”: 
were developed. Here we use the approach that 


we first generate an initial variable ordering and 
then try to optimize it. 
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Initial variable orderings are generated in 
two different ways; if the synthesis program 
receives circuit descriptions in sum-of-products 
representation, the variable orderings are gene- 
rated by analyzing cover patterns, and if the 
synthesis program receives circuit descriptions in 
multi-level logic representation, the variable or- 
derings are generated by traversing the circuits 
in depth-first way as shown in Ref. 9. Both these 
situations can happen in logic synthesis. In some 
cases, specification for a circuit is in a truth 
table format, and in other cases, designers want 
to specify circuits with many intermediate vari- 
ables which are actually multi-level logic repre- 
sentation. Sometimes designers want to optimize 
their circuit designs using multi-level logic 
minimization methods, in which case the input to 
logic synthesis systems is also in multi-level logic 
representation. 

The initial orderings are optimized in the 
following way: First we construct BDDs for 
logic functions using the initial orderings, and 
then minimize sizes of BDDs by exchanging a 
variable with its neighbor in the ordering. The 
resulting orderings are used to calculate permis- 
sible functions for multi-level minimizations. 

Since sizes of BDD highly depend on 
variable orderings, minimization time are also 
drastically influenced by the variable ordering 
used, although the quality of minimization 
results does not change. The required time for 
generation of initial orderings and optimization 
of them is much less than that for multi-level 
minimization. We present experimental results 
and show that we can get large speed-up by the 
presented methods. 

In chapter 2, we briefly review permissible 
functions expressed in BDDs. In chapter 3, we 
present the method for initial ordering genera- 
tion. In chapter 4, we present the method of BDD 
minimization after constructing BDDs for logic 
functions. Chapter 5 shows experimental results, 
and finally chapter 6 gives concluding remarks, 


2. Boolean resubstitution with permissible 
functions and BDD 
In this chapter, we briefly review the two 
key issues used in our multi-level logic minimiza- 
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PF, = (00**] 


PF;: A set of permissible 
functions 


Fig. 1— An example of permissible functions. 


tion methods: permissible functions and BDD. As 
for the details, please see Refs. 5-7. 


2.1 Permissible functions 

The key concept of permissible functions is 
that each node in a circuit is an incompletely- 
specified logic function of the primary inputs due 
to the don’t care sets obtained from network 
topologies, and permissible functions represent 
possible implementations at such nodes‘. Permis- 
sible functions are defined on each node (a 
primary input, a gate, or a primary output) in 
the circuit. They are defined as follows. Assume 
vu; 1S an intermediate node in a network. The 
logic function of any output variable in the 
network may not change even when the logic 
function F’; of node vu; is replaced with another 
logie function PF,. Then the logic function PF’, 
is called a permissible function of node v;. 

Usually, there is more than one permissible 
function for a node. Therefore, the don’t care 
mark (*) is used to represent a set of permis- 
sible functions. Figure 1 shows an example of 
permissible functions. In this figure, v,; and v, 
are input nodes, v, is an output node, and vy, is 
an intermediate node. The F’ vector in the truth 
table represents a logic function of each node v,. 
G, is an OR gate. Since the first and second 
values of F’, are Os, the first and second values 
of F, must remain to be Os. The third and fourth 
values of F/, are ls and the third and fourth 
values of Ff, are ls. Thus, the third and fourth 
values of F’, may be either 0 or 1, and the logic 
function of F’, does not change even when logic 
function F’, is replaced with a logic function in 


FUJITSU Sci. Tech. J., 29, 2, (June 1993) 


M. Fujita, and Y. Matsunaga: Variable Ordering of Binary Decision Diagrams for Multi-Level Logic Minimization 


Fig. 2— BDD representation of F = v, & v.+ U3. 


PF. Then PF’, is the set of permissible functions 
of v;. Though the logic functions and permissible 
functions in this figure are represented in terms 
of truth tables, in our implementation these are 
represented in BDD (see below). Permissible 
functions can be calculated by traversing net- 
works from outputs to inputs. The details can be 
found in Refs. 4 and 5, 


2.2 Binary decision diagrams 

BDD or sometimes called Ordered BDD 
were proposed by Bryant”. A BDD is a kind of 
decision graph for representing logic functions 
with restrictions on the ordering of variables in 
the graph. Boolean functions are represented by 
directed, acyclic graphs with a vertex set 
containing two types of vertices. A non-terminal 
vertex has as attributes an input variable index 
and two children. A terminal vertex has as 
attributes a constant value 0 or 1 (to express 
permissible functions, we added one more 
constant ‘*’ to express don’t care value). 
Ordered means that if x; < x, then all nodes 
with x; precede all nodes with x;. A path from 
the root to the terminal vertex with value 0 (or 
1) gives a condition when logic function f = 0 (or 
f= 1). 

Figure 2 shows an example of BDD repre- 
sentation of a logic function F = v, & vz + Us, 
where “&” represents AND and “+” represents 
OR. In this figure, a rectangle indicates a 
terminal node with a logical value, and a circle 
indicates a non-terminal node containing the 
variable index with the two children indicated 
by branches labeled 0 and 1. The variable 
ordering of this graph is v, < v. < v3. Bryant 
developed efficient procedures for the operations 
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a) Best case 


b) Worst case 


Fig. 3— Variable ordering. 


of BDDs’. Those operations take time propor- 
tional to the sizes of BDDs. 

Although BDD seems to be very promising, 
there is a big problem which must be resolved 
before we apply BDD to various areas. It is the 
variable ordering problem. The graph size heav- 
ily depends on the variable ordering. Figure 3 
shows two different BDDs for the same logic 
functions using different variable ordering. In 
Fig. 3a), the best variable ordering is used, and 
in Fig. 3b), the worst variable ordering is used. 
As can be seen in the figure, as the number of 
2-input AND gates increases (the number of 
input variables also increases proportionally), 
the size of the resulting BDD (the number of 
vertices in BDD) increases exponentially with 
the worst ordering, while these increase can be 
restricted to polynominal order if we use the 
best ordering. With a good ordering, BDD 
remains reasonably small for logic functions 
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F, = a&b+~ak&~b 


F, = a&~b+~a&b 


F, 


b) Shared BDD c) Shared BDD with 
negative edges 


Fig. 4—Shared BDD with negative edges. 


which need exponential sizes in sum-of-products 
representation. 

A graph can be shared with many logic 
functions and permissible functions, and the 
negative edge can be used to indicate an 


complemented logic" 


. This improvement enables 
the graph to be copied only by operating the 
pointer. The effective use of graph sharing and 
negative edges reduces CPU-time and memories 
significantly”. Figure. 4 shows an example of 
shared BDD with negative edges. 

Although we use shared BDD with negative 
edges in real implementation, original BDD 
representation will be used in the following 
presentation for simplicity. 


3. Generation of initial variable orderings 

In the minimization process, BDDs for both 
logic functions and permissible functions are 
constructed and used. Generally, sizes of BDDs 
for logic functions are much (around 10 to 100 
times) smaller than sizes of BDDs for permissible 
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functions. Also, we can say by experiments that 
a variable ordering which gives smaller BDDs 
for logic functions also gives smaller BDDs for 
permissible functions. This means if we can get 
good variable orderings for BDDs of logic 
functions, we can also use them for BDDs of 
permissible functions effectively. 

Here, we first construct BDDs for logic 
functions using the initial variable orderings 
generated by the heuristics, and then minimize 
sizes of BDDs by exchanging a variable with its 
neighbor in the ordering. In this chapter, we 
show the methods to generate initial variable 
orderings. 

We have developed a variable ordering 
algorithm based on the heuristics: to minimize 
the number of net crossing when a circuit 
diagram is drawn, which is experimentally 
proved to be powerful”. This method is proven 
to be very effective for multi-level circuits by 
applying it to ISCAS test generation benchmark 
circuits. 

Here we use two different methods for 
different circuit types; if the circuits are initially 
given in sum-of-product representation, we use 
the method which analyzes cover patterns, and 
most binate variables are ordered first. If the 
circuits are initially given in multilevel logic 
representation, we use the above heuristics”. 

The variable ordering method for circuits in 
sum-of-products representation is very simple; it 
first compute how binate each variable is in 
minimized cover expressions, 1Le., how many 
complemented and uncomplemented variables 
appear in sum-of-products representation for 
circuit functions” ”. Os and 1s appear in each 
column of the cover representation for circuit 
functions. So we first minimize a given sum-of- 
products representation by ESPRESSO” and get 
its minimized representation in cover format, 
which is matrix representation of sum-of-pro- 
ducts representation”. We count up the number 
of 0s (which correspond to complemented vari- 
ables) and 1s (which correspond to uncomple- 
mented variables). Here 2s which correspond to 
don’t care value in cover format are not counted 
up. Then each variable is ordered with its 
binateness, i.e., most binate variables are ordered 


FUJITSU Sci. Tech. J., 29, 2, (June 1993) 


M. Fujita, and Y. Matsunaga: Variable Ordering of Binary Decision Diagrams for Multi-Level Logic Minimization 


first. The most binate variable is the variable 
having the most number of Os and 1s. If there is 
a tie, original ordering (appeared in the original 
benchmark circuit data) is used. Although this is 
very simple heuristic, it is proven to be very 
powerful by the experimental results shown in 
chapter 5. 


4. Optimization of variable orderings 

In this chapter we present a method to 
optimize variables orderings by exchanging a 
variable with its neighboring one. As we show in 
the following that we can easily get BDD for the 
variable ordering where only neighboring two 
variables are exchanged, if the BDD for the 
original variable ordering is given. This is 
because what we have to do is only to traverse 
and modify nodes relating to the two variables 
being exchanged. 

Now suppose that we are exchanging the 
variable of i-th order with the variable of 
(.+1)-th order. Since BDDs are canonical forms, 
sub-BDDs having only nodes whose variable 
indices are from 1 to (i—1)-th and sub-BDDs 
having only nodes whose variable indices are 
from (i+ 2)-th to n-th remain unchanged even 
after the variables exchange. So, what we have 
to do is to modify parts of BDDs relating to the 
nodes whose variable indices are i-th or (i+ 1)-th. 

There are several cases in the topology of 
those parts of BDDs, which are shown in Fig. 5. 
In this figure, f,, fz, fs, f, represent different 
logic functions (or, in terms of BDDs, they point 
to different nodes). The first and simplest case is 
case 1 of Fig. 5; in the original BDD, only nodes 
whose variable index is (i+1)-th exist and there 
are no nodes whose variable index is 1-th. In this 
case we only change the variable indices of the 
nodes, or in practical, there is no change in the 
BDD structure. The same situation hold for the 
case 2 of Fig. 5, where only nodes whose 
variable index is t-th exist. 

The general case is shown in Fig. 5c). In this 
case, we change edges from the. nodes of i-th and 
(.+1)-th variables as well as the variable indices 
of the nodes. However, if two of f,, fo, fs, and f, 
are the same, we may eliminate some nodes, as 
shown in Fig. 5d). We can easily check it by 


FUJITSU Sci. Tech. J., 29, 2, (June 1993) 


d) Case 4 


Fig. 5—- Exchange between i-th and (i + 1)-th 
variables. 


examining which ones are the same. After 
changing parts of BDDs as above, we execute 
the reduce operation in Ref. 7 only to those 
modified parts of BDDs. This procedure is the 
same for both BDDs and Shared BDDs. 

A variable exchange example is shown in 
Fig. 6. In this figure, i-th and (+1)-th variables 
are exchanged. We traverse from the root nodes, 
and when we first arrive in a node whose 
variable index is i-th or ({+1)-th (in the figure, 
nodes A, B, C, D, we apply the above procedure 
to modify parts of BDDs (when traversing, we 
first arrive the node D from the node E directly). 
For example, the node A is a case of Fig. 5d), 
the nodes B and C are reverse cases of Fig. 5d), 
and D is a case of Fig. 5a). So, the resulting BDD 
becomes the one as shown in the bottom of 
Fig. 6. 

The above procedure is applied, and sizes of 
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Fig. 6— An example of ordering exchange. 


BDDs before and after application of the pro- 
cedure are compared. If there are some gains 
(the size of BDD after the application of the 
procedure is smaller), we really execute the 
variable exchange {we call here this procedure 
as Var_exchange (i)}. Var_exchange (i) is repeat- 
edly applied by incrementing i until no improve- 
ment is made. We can say that if i-th and 
((+1)-th variables can be exchanged, then in the 
resulting BDD, there is a possibility that those 
exchanged variables can be further exchanged 
with their neighbors. So, we mark those variable 
indices, and we apply Var_exchange (z) only to 
those marked ones. This control procedure is 
shown in Fig. 7. 


Var_exchange_control() 


for each i {s{i] = 1} 

i=1; 

while (some s[i] is 1) { 
if (S[i] == 1) { /* only flaged index is tried */ 

s[i] = 0; 

if (Var_exchange(i) == 1) { /* exchange index i with i+1, if gained */ 
s{i-1]=1; /* maybe exchangeable */ 
s{i+1]=1; /* maybe exchangeable */ 


} 

i=i+1; 

if (i ==n-1)i=1; /* nis the number of primary inputs */ 
} 
} 


Fig. 7— Variable exchange sequence control 
procedure. 


Table 1. Results of variable ordering methods applied to benchmark circuits in sum-of-products 
— Sizes of BDD for logic and permissible functions — 


Nodes for logic functions Nodes for permissible functions 
Circuits name 
Original Heuristic Exchange Original Heuristic Exchange 

Apex| 228 331 4 893 4 596 = 38 891 37 586 
Apex2* 20 563 12 530 9 652 158 526 | 71 068 52 293 
Apex3 = 5 621 5 635 = 40 956 41 163 
Seq 258 595 7 424 5 869 = 39 665 32 386 
Planet** 5 296 2 247 2 247 24 468 8 595 8 595 
Sand 9 635 2315 2 209 65 928 12 779 11 035 
Styr 3 928 2 428 2 309 17 987 10 379 11115 
Sef 8 285 4 000 4 068 50 542 16 321 15 694 


*: Subset of don’t cares (similar to Ref. 2) are used in minimization. 
**: There is no improvement by variable exchange. 


Machine: SUN4/260 
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Table 2. Results of variable ordering methods applied to benchmark circuits in sum-of-products 


— Synthesis quality and time- 


Synthesis time (ratio: exchange = 1.0) 

Circuits name | Final literals 1 

Original Heuristic Exchange 
Apex] 1 139 =f 4) 3 256 (1. 04) 3 132 (1. 0) 
Apex2* 189 1 094 (4. 01) 384 (1. 41) 273 (1.0) 
Apex3 1117 Cie 3) 3 131 (1.01) 3 107 (1.0) 
Seq 795 ae) ae, 5 089 (1. 18) 4 318 (1.0) 
Planet** 513 1 350 (3. 64) 371 (1. 00) 371 (1.0) 
Sand 42] 3 194 (7. 46) 503 (1. 18) 428 (1. 0) 
Styr 403 774 (2. 03) 393 (1. 03) 381 (1. 0) 
Sef 830 900 (3. 03) 297 (1. 00) 297 (1.0) 


EEUU ENSUE NUE NNER 
*: Subset of don’t cares (similar to Ref. 2) are used in minimization. 


**: There is no improvement by variable exchang. 
Machine: SUN4/260 
CPU time: second 


Table 3. Results of variable ordering methods applied to benchmark circuits in multi-level logic 


ae Nodes for logic functions Binal Synthesis time (ratio) 

Circuits name : 

Original Heuristic Exchange | literals Original Heuristic Exchange 

— —— - : 

Apex6 5 962 3 141 2 625 14 | = oe ) 341 (1. 05) 326 (1.0) 
Apex7 7 748 948 926 | 279 84 (2,21) | 43 (1. 13) 38 (1.0) 
Rot 207 302 65876 | 44376 | 1193 | - (- )| - ¢€ - )| 232801,0) 
C432 11 262 11 262 8 387 194 72.2 (1.46) | 72:2 (1.46)! 49.6 (1.0) 
C880* = 27 656 24 763 413 =e sD) 208 (1. 06) 196 (1. 0) 
C2670* = 129 587 91 434 853 = € - ) | 3803 (1..35):| 2811 (1.0) 
C5315" = 162 232 73 962 2 061 = « = 2 — (= 3) 7451 C1. 


*: Subset of don’t cares (similar to Ref. 2) are used in minimization. 


Machine: SUN4/260 


5. Experimental results 

We have applied the 
methods presented earlier to logic synthesis 
benchmark circuits. Tables 1 and 2 show the 
results when applied to sum-of-products repre- 
sentation of the benchmark circuits. The second, 


third, and fourth columns in Table 1 show the 


variable ordering 


number of nodes in BDDs for logic functions by 
the original variable ordering, the heuristic 
variable ordering presented in chapter 3, and the 
variable orderings after optimization of chapter 
4, respectively. The fifth, sixth, and seventh 
columns show the number of nodes in BDDs for 
permissible functions in the similar way. We can 
see from the table that the variable ordering 
heuristic gives much better orderings than 
original orderings and those orderings can be 
the 


further improved by variable exchange 
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method in chapter 4. 

Tables 2 and 3 show the synthesis times for 
each variable ordering. The synthesis times in 
the columns of “exchange” include those for 
variable exchanges. Table 2 shows the results 
when applied to sum-of-products representation 
of the benchmark circuits whereas Table 3 
shows the results when applied to multi-level 
representation of the benchmark circuits. Note 
that there is a strong correlation among the sizes 
of BDDs for logic and permissible functions and 
the synthesis times. The performance presented 
here is better than other synthesis tools in terms 
of synthesis speed and quality. 


6. Conclusions 


We have presented variable ordering meth- 
ods of BDD. We used the approach that we first 
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generate an initial variable ordering and then 
Initial 
generated by hueristics and 


optimize it. variable orderings are 
the generated 
orderings are further optimized by exchanging 
variable orders. The experimental results by 
benchmark circuits show that our methods give 
very good ordering. 

The presented method for optimizing vari- 
able ordering only exchange orderings for the 
neighboring two variables. It can be easily 
extended to change orderings for neighboring 
k-vartiables. If k is large, there is a chance to get 
much better orderings, although it can be very 
time consuming. There is a trade-off and it is 
one of the future research topics. Also, we plan 
to apply the methods to other application areas, 
such as sequential circuit verification and test 
generation. 
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This paper describes the characteristics and hot carrier effects of ultra-thin-film 
SOIl/pMOSFET’s between 90 and 295 K, and compares them with those of bulk 
MOSFET’s. Decreasing the operating temperature further suppresses the short 
channel effects of ultra-thin-film SOl/pMOSFET’s and improves their excellent 
current drivability. However, the positive threshold voltage shift caused by electrons 
that are trapped in the buried oxide during stressing is especially noticeable at low 
temperatures. Provided the supply voltage can be reduced, ultra-thin-film 
SOI/MOSFET’s are promising candidates for deep-submicron MOSFET’s operating at 


low temperatures. 


1. Introduction 

Compared with bulk MOSFET’s, silicon-on- 
insulator (SOI) MOSFET’s have no latch-up, a 
low parasitic capacitance, and enable a greater 
packing density. Moreover, ultra-thin-film SOI/ 
MOSFET’s have several additional advantages, 
for example, suppression of short channel 
effects, excellent subthreshold characteristics, 
and a greater carrier mobility’ ”. These advant- 
ages make ultra-thin-film SOI/MOSFET’s candi- 
dates for deep-submicron MOSFET’s”. Because 
scaled down MOSFET’s require a reduction in 
supply voltage to maintain device reliability, the 
operation temperature must be reduced to 
maintain a sufficient on/off margin of gate 
voltage. In this respect, ultra-thin-film SOI/ 
MOSFET’s are especially attractive because 
they offer the above advantages even at low 
temperatures. 

This paper compares the characteristics and 
hot carrier effects of ultra-thin-film SOI/pMOS 
FET’s between 90 and 295 K with those of bulk 
pMOSFET’s. 


2. Experiments 


2.1 Fully depleted SOI/MOSFET’s 
Unlike bulk MOSFET’s and conventional 
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SOI/MOSFET’s, the Si film in the channel 
region of fully depleted SOI/MOSFET’s is fully 
depleted under operational conditions, and the 
depth of the source/drain junction is equal to the 
Si film thickness (see Fig. 1). The advantages of 
ultra-thin-film SOI/MOSFET’s mentioned in 
Chap. 1 are due to these features. 


2.2 Device processing 

In this study, separation by implanted 
oxygen (SIMOX) wafers with Si-films of 80-100 
nm and a buried oxide layer of 520 nm were 
used. Single drain p-channel MOSFET’s were 
fabricated on these wafers by the process 
described below. The active regions were defined 
by LOCOS isolation. Phosphorus ions were 
implanted into the channel region at energies 
between 30 and 40 keV at doses between 1.6 x 
10'' and 2.0 < 10° cm~*. The 10 nm gate oxide 
was grown at 1100 °C in O,/Ar. An N* poly-Si 
gate electrode was patterned using RIE etching. 
BF, ions were implanted into the source/drain 
regions at energies between 35 and 50 keV at 
doses between 8.0 X 10" and 1.0 X 10° cm™?. 
Implantation was followed by annealing for 20 
minutes at 850 °C in N,. Bulk MOSFET’s were 
also fabricated on n-type, 10 02cm (100) Si wafers 
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Source Gate Drain 


P-type Si-substrate 


a) Bulk MOSFET 


= 


b) Conventional SOI/ MOSFET 


|_| 


c) Ultra-thin-film SOI/MOSFET 


fe] > P*source/drain ak : Buried oxide 


Far] : Field oxide t : Depletion layer thickness 


Fig. 1—Cross sections of MOSFET’s. 


using the same process and channel control 
doping. The junction depth, Xj, of bulk devices 
was estimated by process simulation to be about 
0.35 44m. The Xj of SOI devices is equal to the Si 
film thickness, and was 80-100 nm in this study. 


2.3 Measurement 

The effective channel length, Ler, is 
obtained by using the Laux method’. In this 
study, devices with Ley = 0.4-10 um and a 
channel width of 20 #m were used. The 
characteristics of the SOI and bulk devices were 
measured at temperatures between 90 and 295 K. 
The characteristics in the linear region were 
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SIMOX pMOSFET 
Tso, = 80 nm 
Tox = 10nm 
W =20um 


“0 1 2 3 3.5 
Lett (um) 


a) SOI/MOSFET 


Bulk pMOSFET 
Tox = 10nm 
W =20um 


“0 1 2 3 3.5 
Lett (um) 


b) Bulk MOSFET 


Fig. 2— Threshold voltage versus Ler for various 
temperatures. 


measured at a drain voltage of —0.1 V anda 
back gate voltage of 0.0 V. In order to 
investigate the hot carrier effect, both devices 
were stressed at 90 K and 295 K with a drain 
voltage between —6.5 and —7.5 V and a back 
gate voltage of 0.0 V. 
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SIMOX pMOSFET 
Tox = 10nm 
W =20um 
Tso. = 80nm 


Channel doping 
—O— : 2x 10'®cm™? 
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a) SOI/MOSFET 
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W =20um 


Channel doping 
—O— : 0 cm“ 
—OoO— : 2x 10° 


—O— : 2x 10" 


‘Sc 
we 


tw 


0.1 
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Temperature (K) 


b) Bulk MOSFET 


Fig. 3— Temperature dependence of effective chan- 
nel length. 


3. Results and discussion 
3.1 Short channel effects 

Figure 2 shows how the threshold voltage, 
Vin, changes with Ler for various temperatures 
for SOI and bulk devices having a channel 
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doping of 2.0 X 10° cm~*. When Lepr is reduced, 
the Vi, of both devices becomes more positive. 
Although short channel effects in SOI devices 
are greatly suppressed at low temperatures, 
short channel effects in bulk devices are 
independent of temperature. This difference can 
be explained by considering the temperature 
dependence of Ler in SOI and bulk devices. 

Figure 3 shows the differences between the 
Lerp at 295 K and the Ler at temperatures 
between 90 and 295 K for various levels of 
channel doping. In low-doped devices of both 
types, Lerr increases at low temperatures. For 
example, reducing the temperature from 295 K 
to 90 K, increases Ler by 0.40 4m in SOI devices 
with a channel doping of 2.0 < 10'em~*, and by 
0.08 4m in their bulk counterparts. On the other 
hand, short channel effects in bulk devices with 
a high channel doping are slightly dependent on 
temperature. These results show that in devices 
with a low channel doping, the gate controls the 
channel region more effectively at temperatures 
below 295 K. 

It is known that low temperature operation 
suppresses short channel effects in bulk devices 
with a low channel doping”. As the temperature 
is reduced, the Fermi level of Si approaches the 
band edge (conduction band edge in this case). 
This increase in the Fermi level, 4&;, increases 


the depletion layer thickness, Jy, as follows (see 
Fig. 4): 


la (MOS) % (2*4E;)'“?, 
lq (p-n) % (4Er)'7*, 


These relations indicate that the channel region 
is effectively controlled by the gate at low 
temperatures. In devices with a lower channel 
doping, FE; is larger; therefore, the suppression of 
short channel effects is more pronounced. 

We will now discuss the reasons for the 
suppression of short channel effects in SOI 
devices (see Fig. 5). S, and S, represent the 
depletion region controlled by the gate and the 
depletion region controlled by the source/drain, 
respectively. Points A, B, C, and D in the 
SOI/MOSFET shown in Fig. 5 are assumed to 
have the following characteristics: 
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Source Drain 


Source Drain 


a) Room temperature 


S,: Depletion layer controlled by the gate 


b) Low temperature 


»: Depletion layer controlled by the source/drain 


Fig. 4— Charge sharing in bulk MOSFET. 


Source Drain 


Buried oxide 


Si-substrate 


Assumptions for points A, B, C, and D: 
a) Lateral electric field is zero. 
b) Ey= Ei. 


Fig. 5— Charge sharing model for SOI/MOSFET. 


1) The lateral electric field is 0 V/cm. 
2) The Fermi level is equal to the intrinsic 
Fermi level. 
However, regardless of the correctness of these 
assumptions, the calculated charge sharing 
coefficients” for both devices {S,/(S, + S,)} is 
independent of temperature. This implies that 
the two-dimensional charge distribution sup- 
presses short channel effects in SOI devices. 
Reducing the junction depth, Xj, further 
suppresses the short channel effects. The Xj for 
SOI devices (80-100 nm) is much smaller than 
that of bulk devices (350 nm). Therefore, at low 
temperatures, short channel effects are 
suppressed more effectively in SOI devices than 


in bulk devices. 
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pMOSFET 

Tox = 10nm 
W = 204m 
L =2.0um 


SIMOX Tso: = 80 nm 


Hye (cm?/Vs) 


60 80 100 200 300 400 
Temperature (K) 


Fig. 6— Temperature dependence of field effect 
mobility. 


3.2 Carrier mobility 

Figure 6 shows the dependence of field 
effect mobility, “;r, on temperature as deter- 
mined from the transconductance, gm, in the 
linear region of devices with a channel doping of 
2.0 X 10° cm~4. The “yy of SOI devices is larger 
than that of the bulk devices throughout the 
temperature range. The “y; of SOI devices 
changes more noticeably with temperature than 
the “££; of bulk devices. The “&/ ¢» of SOI devices 
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Fig. 7- Band diagram and charge distribution for 
MOSFET. 


increases from 155 to 1000 cm*/Vs over the 
range from 295 to 90 K. The change in bulk 
devices over the same range is from 130 to 630 
cm*/Vs. Assuming that “ ¢; is proportional to 
T-*, x is 1.8 for SOI devices and 1.5 for bulk 
devices, indicating that the  ¢ of SOI devices at 
low temperatures is larger than that of bulk 
devices. 

Since both devices in the figures have the 
same channel doping, ionized dopant scattering 
affects the “ys of both devices to the same 
degree. Therefore, it can be assumed that the 
difference between the “rr of SOI and bulk 
devices is due solely to charges that form an 
inversion layer and establish an electric field, Fs, 
that is perpendicular to the channel direction. 
We will now discuss the effect of Ey, on [dp 
based on calculations for a one-dimensional 
MOS structure (see Fig. 7). The total induced 
charge, Qa, which forms the inversion layer is 
related to E; by the following formula: 


Es = Qa/Eox. 


As Es increases, /{/y; decreases’. The Qy values 
for SOI and bulk devices at various tem- 
peratures are shown in Table 1. The table shows 
that: 

1) At any temperature, the Qi for SOI devices, 
Q@a(SOD is less than that for bulk devices, 
Qa(bulk). 

2) The ratio Qg(bulk)/Q4(SOI) is inversely 
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Table 1. Total charge density required to form an 
inversion layer 


T(K) | Qa(SOT) | Qa(bulk) | Qa (bulk)/@a (SOD) 
90 1.94 4.62 2.38 
150 1.90 4.40 2.32 
200 1.88 4.22 2.24 
250 1.85 4,02 2.17 
295 1.83 3.89 2.13 


(unit) Qa: X10''/cem? 


proportional to temperature. 
The first of the above observations explains why 
the “4, of SOI devices is greater than that of 
bulk devices at any temperature. The second 
observation explains why this difference in /Z gp 
is enhanced at low temperatures. 


3.3 Subthreshold characteristics 

Figure 8 shows the dependence of subthre- 
shold swing, S, on temperature for SOI and bulk 
devices. The dashed lines in the figure show the 
theoretical limits for S, i.e. (kT/q) + In10 at each 
temperature. The difference between the theo- 
retical limits and experimental values for SOI 
devices is independent of temperature, but the 
difference for bulk devices decreases when the 
temperature decreases. 

In general, S is given by the following 
equation”: 


S = (kRT/q) + In 10+ {1+ (Ca + Cy)/Cox}, 


where Cy and Cy, are the depletion layer 
capacitance and equivalent interface state capa- 
citance, respectively. Since Cy for SOI devices is 
almost zero, S for SOI devices at 295 K is nearly 
equal to the theoretical value. However, the 
value of S for SOI devices may be partially 
determined by the effects of the interface states 
at the front gate (gate oxide/SOI) and the 
interface states at the back gate (SOI/buried 
oxide). 

The ratios of the experimental values to the 
theoretical limits, i.e. {1+(Cy +Cj)/C,,} in the 
above formula, are shown in Fig. 9. When the 
temperature decreases, the ratio for SOI rapidly 
increases, especially at 90 K. This can be 
attributed to the effects of the interface states at 
the back gate. The electrical characteristics of 
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Fig. 8— Temperature dependence of subthreshold swing. 
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Fig. 9— Temperature dependence of normalized 
subthreshold swing. 


the buried oxide used in this study, which is 
formed by implanting O* into the Si substrate, 
seem to be inferior to those of the thermal oxide. 
When the temperature decreases, Ey and 
therefore the surface potentials at the front and 
back gates increase. Therefore, the increase in 
surface potential at the back gate interface at 
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low temperatures may cause the subthreshold 
characteristics of SOI devices to deviate more 
rapidly from the theoretical than is the case for 
bulk devices. 


3.4 Hot carrier effects 

Hot carrier effects were investigated for 
both devices with an Ler of 1.5 2m and a chan- 
nel doping of 2.0 x 10° cm~’. Since the changes 
in the characteristics of bulk devices that are 
caused during stressing are most noticeable at 
the maximum gate current, /,”, we chose a stress 
condition for both devices such that J, was 
maximum at Vg; = —6.5 V. The polarity of J, 
shows that it is an electron current. This 
indicates that the electrons which are generated 
by impact ionization near the drain edge are 
injected into the gate oxide, and that some of 
them are trapped in the gate oxide. 

Figure 10 shows how 4 gm/Zmo changes with 
the stress time at 295 K and 90 K. For 
both devices, 4gm/Zmo 1S positive and is larger at 
90 K than at 295 K. At both temperatures, the 
A8m/8mo Values for SOI devices are larger than 
those for bulk devices. The 4gm/Zmo values are 
positive because of the reduction in Ler due to 
fixed negative charges that are produced in the 
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Fig. 10—Change in transconductance versus stress 
time. 


pMOSFET Tox= 10nm = Vas = —6.5V Channel doping 
W = 20um Ves ([g: max) 2X 10'%cm™ 


Leg = 1.5 4m Tso. = 100nm 


295K SIMOX 


—e-e- o-oo 
—9-0-9@- 00 FF 5). SIMOX 


1 10 10° 10% 10* 
Stress time (s) 


Fig. 11—Change in threshold voltage versus stress 
time. 


gate oxide by the electron injection performed 
during stressing. Analysis of the 4gm/gmo of SOI 
devices that were stressed for 1000s indicated 
that Ler was reduced by between 0.10 and 0.15 
fm. 

Figure 11 shows how V,, changes with the 
stress time. Although 4V,, can be detected only 
for SOI devices, this cannot be explained by the 
short channel effects caused by the reduction in 
Lett 

The hot carrier effects in SOI devices are 
associated with the electrons injected during 
stressing and become trapped in the buried 
oxide. The effect of charges in the buried oxide 
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W/Lete = 20 um/2.0 um, Tox = 10nm, Vas= —7.5 V 
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ALes = 0.116 4m 
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Fig. 12—Effects of trapped electrons in gate and 
buried oxide on transconductance. 


on the front gate characteristics is equivalent to 
the effect of substrate bias in bulk devices. 
Hereafter, this equivalent substrate bias effect 
will be called the charge coupling effect’. 
Trapped electrons cause a positive AV; in SOI 
devices. The trapped electrons in the buried 
oxide increase the gm of SOI devices because 
they reduce F,'. Therefore, trapped electrons in 
both the gate oxide and the buried oxide affect 
Agm/Zmo. Figure 12 shows how the effects of the 
front gate and the buried oxide are separated by 
the method described in Ref. 11. The changes in 
SOI device characteristics occur not only 
because of the reduction of Lers due to trapped 
electrons in the front gate, but also because of 
the charge coupling effect due to the electrons 
that are trapped in the buried oxide during 
stressing. 


4. Conclusion 

The characteristics and hot carrier effects of 
ultra-thin-film SOI/pMOSFET’s at low tempera- 
tures were studied and then compared with those 
of bulk pMOSFET’s. 

At low temperatures, the suppression of 
short channel effects is greater in SOI devices 
than in bulk devices having the same channel 
doping. This occurs because, at low tempera- 
tures, Lerp in SO] devices increases more rapidly 
than in bulk devices and because the Xj of SOI 
devices is much smaller than that of bulk 
devices. The low-temperature carrier mobility of 
SOI devices increases more rapidly than that of 
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bulk devices. The above is supported by 
calculations which show that the total charge 
required to form an inversion layer is less for 
SOI devices than for bulk devices. The deviation 
from the theoretical value of the subthreshold 
swing of SOI devices at low temperatures may 
be due to the effect of the back gate interface, 
which is caused by changes in the Fermi level of 
Si with temperature. 

The changes in the characteristics of SOI 
devices which are due to hot carrier effects are 
larger than in bulk devices. Also, the changes in 
the characteristics of SOI devices are larger at 
lower temperatures. Stressing in SOI devices 
traps electrons in the gate oxide and buried 
oxide. The trapped electrons in the buried oxide 
cause not only a positive shift in Vin but also 
increase gm due to the charge coupling effect. 
From the viewpoint of reliability, the supply 
voltage of SOI devices must be reduced when 
they are operated at low temperatures. Provided 
the supply voltage can be reduced, ultra-thin-film 
SOI devices are promising candidates for 
deep-submicron MOSFET’s operating at low 
temperatures. 
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The effects of mechanochemical polishing on the surface flattening and the 
reliability of MOS diodes have been studied. Surface microroughness was observed 
with TOPO-2D, XTEM, STM, and AFM. The Fowler-Nordheim tunneling current, 
time-dependent dielectric breakdown (TDDB), surface state density, and flat band 
voltage under stress after fabricating MOS diodes on the polished wafers were 


measured. 


It is confirmed that polishing leads to excellent TDDB characteristics, because the 
polishing reduces the surface roughness atomically, which decreases the tunneling 
current through the oxide. Polishing also lowers the surface state density and 
decreases a flat band voltage shift under constant current stress. 


1. Introduction 

State-of-the-art techniques in the study of 
surface morphology ~—e.g. high-resolution §trans- 
mission (HRTEM) “, 
reflection electron microscopy (REM), spot 


electron microscope 
profile analysis of low-energy electron diffrac- 
tion (SPA-LEED)” ”, scanning tunneling micros- 
copy (STM), and atomic force microscopy 
(AFM) —have shed new light on what happens 
when tunneling occurs in the thin oxide layers in 
VLSI circuits. 

Roughness at the Si-SiO, interface was 
observed with HRTEM”:”. The peaks of the 
interface roughness for 4.1 nm oxides grown at 
900 °C were about 1.4 nm. These degrade the 
dielectric breakdown strength. The interface 
roughness measured by SPA-LEED affects the 
fixed oxide charge density, interface state 
density, and Hall mobility”. Ordinary SC-1 
solution (NH,OH : H,O, : HzO = 1:1: 5) 
causes microroughness to degrade dielectric 
breakdown characteristics”. 

Using STM, AFM, XTEM, and TOPO-2D 
(optical interferometry), we observed the sur- 
faces of silicon wafers after different polishing 
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treatments and investigated the effects of the 
roughness on the reliability of MOS diodes. 


2. Wafer polishing 

Figure 1 shows the mechanochemical poli- 
shing configuration. Mechanochemical polishing 
flattens silicon wafers without causing surface 
damage. Polishing uses a polyurethane pad and 
an alkaline solution containing a silica powder 
with particles 0.024m in diameter. The fine 
particles of colloidal silica mechanically remove 


Polishing solution 


Wafer Plate 


Turntable 


Polishing pad 


Adhesion Polishing 


Fig. 1— Wafer polishing system. The wafers are 
attached to the rotating plate and the 
polishing pads to the rotating turntable. 
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Table 1. The dependence of the surface roughness 
and the sillicon removal on the second 
polishing time 


Second polishing time 


(min) 
Silicon removal (nm) | 0 483 
Roughness (nm) 1.59 1.05 0.26 


protruding atoms as the polishing plate rotates. 
The alkaline solution chemically etches the 
surface layer damaged by mechanical treatment. 
After polishing, the wafers’ surfaces become 
mirror-like and flat on an atomic scale across 
the wafer surface. 

In this experiment, we used 4-inch 
boron-doped 10-ohm-cm (100) CZ-Si wafers. The 
polishing solution was Glanzox 3 000 diluted 10 
times, and the polishing pad was a Ciegal 
7355-000. The pressure was 90 g/cm*. After 
prepolishing on both sides, wafers were polished 
for 0, 5, and 60 minutes. Silicon removal is 
proportional to the polishing time (see Table 1). 


3. Microroughness evaluation 
3.1 Optical profilometer 

The TOPO-2D optical profilometer can 
determine the roughness over 0.69 mm in a 
lateral direction using an interferometer. The 
incident light (A = 650 nm) is split in two, and 
one beam is directed to the sample surface. The 
light intensity is changed by the surface 
roughness and the two beams are then 
recombined. The height of the protrusions is 
determined from the resulting lght intensity. 
The detection limit was 0.12 nm and_ the 
repeatability 0.03 nm. The lateral resolution was 
0.65 wm. To compare the roughness of each 
sample, the root mean square roughness (Rims) 
was calculated by the following equation, 


Boe = Jf tn@ide, 8a (1) 


where L is the measured distance and h(x) is the 
height at x (see Fig. 2). 

The Rrms was 1.59 nm after the first pol- 
ishing, but the surface became much flatter as 
the second polishing time increased (Table 1). 
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Incident beam Reflected beam 


Sample surface h(x) 
h(x 


Virtual plane 


Fig. 2— Measurement of the surface roughness 
with an optical interferometer. 
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c) Rrms= 0.26 nm 
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Fig. 3- XTEM images of the interface roughness. 


The mean distance between protrusions was 100 
ym. 


3.2 XTEM 

High-resolution XTEM gives the position of 
each atom from its lattice image (see Fig. 3). 
The samples were thinned by argon ion milling 
for 30 to 35 hours. The acceleration voltage was 
200 keV and the magnifying power 4 x 10°. We 
observed undulations 2 to 3 atoms high at the 
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a) Rems= 159 nm 


b) ARrms= 1.05 nm 


Fig. 4—STM images of the silicon surface. 
Scan area was 10 nm xX 10 nm. 


Si-SiO, interface of the 1.59-nm Rrms sample, 
but the interfaces of the 1.05- and 0.26-nm 
samples were comparatively flat. 

The XTEM samples were very thin, but 
they were still a few nm thick in the direction of 
the incident beam. The images are affected by 
the total thickness, so it is difficult to observe 


the true surface of the sample. 


3.3 STM 

STM produces a two dimensional profile. 
The sample must be conductive, because STM 
monitors the tunneling current. The STM images 
were taken in air immediately after surface 
oxides were removed with a 5 percent HF 
solution (see Fig. 4). While the surface of the 


1.05-nm sample was atomically flat and no 
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AFM data 


c) Rrms= 0.26 nm 


Fig. 5- AFM images of the silicon surface. 
Scan area was 500 nm 500 nm. 

protrusion was detectable, the surface of the 

1.59-nm sample showed protruding atoms. The 


protrusions of over 1 nm high were seen. The 
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Table 2. Comparison of methods 


Method Interval (nm) Height (nm) 
TOPO2D | ~ 1x 108 1.59 (Rrms) 
TEM | ~ 25 ~ 08 
STM ~2 0.3 - 0.7 


height of the protrusions and distance varied 
between them. 

STM shows that mechanochemical polishing 
can, with time, flatten the wafer surface. 


3.4 AFM 

AFM uses the atomic force between the tip 
of the probe and the sample surface, therefore, 
the sample need not be conductive. The AFM 
observation was done immediately after oxide 
removal to avoid surface contamination. The 
scan area of AFM profiles (see Fig. 5) is 2.5 
10° nm®*, which is 2 500 times as large as that of 
the STM profiles (see Fig. 4). 

The maximum heights of the protrusions on 
the 1.59-, 1.05-, and 0.26-nm samples, which are 
shown in Figs. 5a), b), and c) respectively, were 
about 2-, 1-, and l-nm. AFM also shows that 
polishing flattens the wafer surface of samples. 


3.5 Comparison of methods 

The estimated intervals between protrusions 
and heights of the 1.59-nm Rims sample obtained 
using the above methods are listed in Table 2. 
Although the difference in protrusion heights is 
small, the distance between protrusions obtained 
by the various methods are quite different. To 
observe the roughness which affects the increase 
in electrical field in SiO,., it is necessary to use 
the method which has the resolution less than 
the SiO, thickness. 

TOPO-2D catches the roughness over an 
interval of approximately 100 mm. Even high- 
resolution XTEM merely catches protrusions 
about 25 nm apart which is more than the 
thickness of the oxide layer. XTEM showed that 
the oxide apparently formed uniformly along 
such a long-term protrusion. In this experiment, 
only STM reveals the effect of the roughness on 
the local thinning of the oxide layer on a nm 
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Fig. 6- Fowler-Nordheim current for three surface 
roughnesses. 


order. 


4. Electrical properties 

To determine the effects of polishing on the 
reliability of the MOS diodes, diode arrays were 
fabricated on the wafers. After acid cleaning, 
oxide layers 11 nm thick were thermally grown 
at 900 °C. Poly Si gates were deposited after gate 
oxidation. The samples were 5 percent H,/ 
N,-annealed at 450 °C, and gold contacts were 
evaporated onto the back of the wafers. 


4.1 Fowler-Nordheim current 

The Fowler-Nordheim (FN) current densi- 
ties obtained with an electric field stress of 9 
MV/cm were measured (see Fig. 6). The FN 
current through the oxides gradually decreased 
as the surface was flattened. 

The FN current Sat through a flat interface 
is described as": 


Jia, = AB* exp(—B/E), = = — seers (2) 


where A, E, and § are the proportional factor, 
the electrical field and the exponential factor 
respectively. Assuming that the surface rough- 
ness has a sinusoidal wave form, the FN current 
Jrough through a rough interface is expressed by 


the following equation", 


Trough = Chat, seeeee (3) 
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C=1+1/12@*+4a+ 6)e*, ov (4) 


where a = §$/E, € = A/d,. dis the amplitude 
of the sinusoidal wave form and d, is the 
average thickness of the oxide. The increasing 
factor of the surface roughness C can be 
estimated using Equation (4). Substituting the 
roughness height 2 4shown in Table 2 into Equa- 
tion (4) when E = 9 MV/cm, B= 3.2 x 108 cm/ 
MV, d, = 11 nm, we calculated that C = 1.16 
for 2 4 = 0.8 nm from XTEM image, C = 1.12 
for 2 4 = 0.7 nm from the maximum value of the 
STM image, and C = 1.02 for 2 4 = 0.3 nm from 
the minimum value of the STM image. 

The measured FN current shows that drough 
= 1.09 Jat in the rough (Ryms = 1.59 nm) and the 
smooth (Rims = 0.26 nm) wafers (see Fig. 6). An 
increasing factor of 1.09 reflects the morphology 
shown by the STM images. The oxide layer 
could not form evenly on the 1.59-nm sample 
observed in the STM image. The oxide might be 
thinned at the protrusions to increase the FN 
current. Mechanochemical polishing decreases 
the FN current by the atomically flattening 
effect. 


4.2 Time-dependent dielectric breakdown 

(TDDB) characteristics 

The TDDB characteristics for three samples 
with different surface roughnesses were mea- 
sured under a constant field of 9 MV/cm (see 
Fig. 7). We found that the degradation starts 
earlier, and the cumulative failure is higher as 
the surface gets rougher. The time taken to 
reach a cumulative failure rate of 10 percent for 
the 0.26-nm sample was 1 500 times as long as 
that for the 1.59-nm sample. The relationship 
between the TDDB lifetime Tba and the applied 
field FE’ is expressed by the following equation for 
the MOS diodes with uniform interfaces": 


Tod = Bexp{(@+ AV/E}, seve (5) 


where B and H are the proportional factor and 
the exponential factors in impact ionization 
coefficient respectively’. Although Equation (5) 
cannot express the difference between Thd for 
samples with different roughnesses quantita- 
tively, the increase of FE due to the local thinning 
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Fig. 7— Time-dependent dielectric breakdown 
characteristics for three surface rough- 
nesses. 
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Fig. 8—Surface state density for three surface 
roughnesses when constant current is 
injected. 


reduces Tbd significantly. 

We deduced the TDDB degradation mecha- 
nism from the roughness observations and the 
FN characteristics. Protrusions shown in the 
STM image (see Fig. 4) cause a local thinning of 
the oxide, which increases the FN current (see 
Fig. 6). When a high field is applied at this point, 
injected tunneling electrons are accelerated and 
become hot electrons. These hot electrons form 
hole traps in the oxide by impact ionization and, 
in time, oxide breakdown may develop at local 
thinning spots. 

Mechanochemical polishing significantly im- 
proves the reliability of MOS diodes. 
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Fig. 9- Flat band voltage shift under constant 
current stress. 


4.3 Surface state density 

The surface state densities of the surface 
roughnesses of the three samples were almost 
the same when the injected carrier density (Ninj) 
was 1 X 10° cm ° (see Fig. 8). Small differences 
appeared, however, when the Ninj was 1 x 10" 
cm *; surface state density increases with surface 
roughness. When Ninj was 1 x 10% cm®%, the 
maximum surface state density at mid-gap was 
obtained by the roughest wafer. We think this 
increase on a rough wafer is caused by an 
aberration from the (100) crystal orientation at 
the rough interface. A (111) surface has a surface 
state density twice as high as a (100) surface 
under the same carrier injection'’. Rough 
interfaces allow more dangling bonds to form at 
the Si-SiO, interface than smooth interfaces and 
this increases the surface state density. 


4.4 Flat band voltage shift 

Figure 9 shows the flat-band voltage shift 
(4V.-,) caused by constant current stress. Large 
roughness increases 4V,,, for injected carrier 
densities over 10'° cm *. The accumulation of 
positive charges increases in the oxide layer with 
the large rough interface. Polishing is also 


effective on AV ;, decreasing. 


5. Conclusion 

We studied the effects of wafer polishing on 
the reliability of MOS diodes. We found that 
flattening the wafer surface by polishing 


FUJITSU Sci. Tech. J., 29, 2, (June 1993) 


decreases the Fowler-Nordheim current, and 
increases the TDDB life time under stress. If 
large protrusions are left on the surface, they 
form local thinning spots after oxidation. When 
a high field is applied at this point, injected 
tunneling electrons are accelerated and become 
hot electrons. These hot electrons form hole 
traps in the oxides by impact ionization and, in 
time, oxide breakdown may develop at local 
thinning spots. 

We also found that polishing lowers the 
surface state density and decrease a flat band 
voltage shift under constant current stress. The 
increase of the surface state density on a rough 
surface is caused by aberrations in the (100) 
crystal orientation. 
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This paper describes a digital signal processing quadrature phase shift keying 
(QPSK) burst demodulator for satellite communications that uses a new technique 
which sequentially changes the receive filter. By changing the filter bandwidth 
according to the signal preamble pattern, the proposed technique improves the 
recovered carrier S/N and the recovered symbol clock S/N. The demodulator’s low 
unique word miss probability (Pmiss) and its low cycle-skip rate for the recovered 
carrier and symbol clocks at very low £)/N, (3 dB or less) will ease the design of 
very small aperture terminal (VSAT) systems. A governmental communications 
application for the system is briefly discussed. 


1. Introduction 

Communication satellites such as the Japan 
Communication Satellites (JCSATs) and the 
Space Communication Satellites (SCSs)’ have 
been launched in Japan recently. Their 
availability stimulated the development of very 
small aperture terminal (VSAT) communication 
systems. 

These systems usually consist of many 
VSATs and a central hub station” *. Although 
the VSAT’s relatively small antenna of 0.8 to 1.2 
meters, is a great cost advantage, it also makes 
the signal quality very low (Eb/N, = C/2N = 3 
dB or less). Therefore, high coding gain forward 
error correction (FEC) is indispensable to 
improve the bit error rate (BER). 

A high coding gain forward error corrector, 
such as a Viterbi decoder having a constraint 
length (k) of 7”, requires a burst demodulator 
that operates at very low bit-energy to noise 
ratios (Ep/N,), that has a low unique word miss 
probability (Pmiss), and has a low cycle-skip rate 
(Pes) ®. Since both carrier and clock synchro- 
nizing performance must be improved to meet 
these requirements, practical designs convention- 
ally adopt the following three techniques: 

1) loop-bandwidth-variable carrier recovery 
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(CR) and symbol timing recovery (STR) 

circuits’, 

2) binary phase-shift keying (BPSK) demodula- 
tion instead of QPSK during the preamble to 
increase the E,y/N,'" , and 

3) a kick-off circuit for STR to prevent 
hang-up”. 

Our digital signal processing (DSP) QPSK de- 

modulator also uses these techniques. However, 

to take advantage of recent FEC advances such 
as a k = 9 Viterbi decoder", a concatenated 

Reed-Solomon/Viterbi (k = 7) decoder", or a 

sequential decoder” '”, additional techniques 

will be required to improve Pmiss and Pes. 

To meet these strict requirements, we 
propose a technique to change the receive filter 
bandwidth according to the preamble pat- 
tern”: '. Theoretically, this technique should 
result in a recovered carrier S/N improvement of 
1 dB and a recovered symbol clock S/N 
improvement of at least 3 dB. Measurement 
verified that the DSP demodulator improves 
Pmiss by a factor 100 over that of a conventional 


DSP demodulator. 


2. Major requirements 
Of the major performance requirements fora 
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Table 1. Performance requirements 


Modulation system 

Symbol rate 

Carrier frequency error 
Carrier cycle-skip rate 

Clock cycle-skip rate 

Unique word miss probability 


burst demodulator (see Table 1), we must first 
consider the cycle-skip rate. This should be on 
the order of 10-® at an E/N, of 0 dB”: ”. Thus, 
if the VSAT system uses a Viterbi decoder (k = 
9, R = 1/2), the BER augmentation due to cycle 
skip should be less than 0.1 % at 0 dB. We must 
next consider Piss and assume that its target 
value is 10-7 at an E/N, of 3 dB for a unique 
word length of 32 symbols and a tolerance of 14 
bits. This means only one lost burst (2 kbits) per 
86 hours at a 64 kb/s data rate. This Pmiss value 
eases VSAT system design. 


3. Conventional burst demodulation methods 

This chapter briefly describes conventional 
carrier and clock synchronization methods. The 
burst data format is shown in Fig. 1 for ref- 
erence. 
1) CRand STR noise-bandwidth variation 

During the data and unique word portion of 
the data burst, the CR noise-bandwidth upper 
bound (Bu.,) is determined by the carrier 
cycle-skip rate. However, since a wide CR noise 
bandwidth (Bl...) is needed for initial carrier 
acquisition, we vary Bl., several times during the 
CR portion and the last Bl... must be below Bu, 
before the data portion begins. This is easily 
done using ROM-stored parameters to determine 
Bl.,, and is easily implemented in theCR and 
STR blocks. 
2) BPSK demodulation 

The preamble consists of CR and STR 
portions. In the CR portion, the I and Q-channel 
signals are all 1s. During the STR portion, both 
channels consist of alternate 1s and 0s. There- 
fore, this is BPSK modulation, rather than 
QPSK. We use BPSK demodulation to increase 
E,/N,. Before the unique word is received, 
demodulation method must be changed from 
BPSK to QPSK by selecting the QPSK CR phase 
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QPSK 

480k symbols/s 

+4kHz 

10-° times/symbol at 0 dB Ey/N, 
10-° times/symbol at 0 dB E,/N, 
10-7 times/symbol at 3 dB E,/N, 


Preamble Data 


200 symbols 1 000-3 000 
Receive 


filter 


Fig. 1— Burst data format. 


detector. 
3) Kick-off 

The STR block 
phenomenon that causes STR hesitation before a 


exhibits a hang-up 


phase lock is obtained and which is dependent on 
the initial phase condition between the receive 
and STR clocks. As a result, the decrease in 
Pmiss 18 small even if the E,/N, exceeds the 
critical level. Kick-off is often used to prevent 
hang-up, which can be detected by integrating 
the receive filter output signal at the STR timing 
over a 10-symbol period during the STR portion 
of the burst format. If hang-up is detected, the 
symbol clock is inverted to cancel it. 


4. New burst demodulator 
4.1 Receive filter changing technique 

To increase the signal-to-noise ratios of 
the CR loop (SNi.,) and STR loop (SNig,), 
we propose a new technique to sequentially 
change the bandwidth of receive filter according 
to the preamble pattern. Three types of filters 
are used (see Fig. 2). Type 1 is a low-pass filter 
(Bw << Br/2) used during the CR portion. Type 2 
is a bandpass filter (Bw << Br/2) used during the 
STR portion, and type 8 is a root-rolloff filter 
(Bw = Br/2) as is commonly used in a demodula- 
tor. Because the preamble consists of a station- 
ary pattern, noise power is reduced without loss 
of signal power by using the receive filter 
appropriate for the signal preamble pattern. 
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Attenuation (dB) 
aa 


a) Type l b) 


Br/2 


Type 2 c) Type 3 


Fig. 2— Receive filters. 


Filter modification is easy to implement. 
Since a DSP demodulator usually uses a finite 
impulse response (FIR) receive filter, it is called 
a digital transversal filter (DTF). The DTF’s 
characteristics are changed by changing its 
coefficients, and we have only to prepare three 
sets of coefficients, one for each receive filter. 

1) Low-pass filter (Bw <<Br/2) for the CR 

(type 1 filter) 

Before proceeding, we must clarify the 
relationship between the signal-to-noise ratio of 
the receive filter output (SNRi) and _ the 
signal-to-noise ratio of the loop bandwidth (SN). 
For carrier recovery, the signal-to-noise ratio of 
the CR loop (SNi¢r) is given by 


= (1) 


SNler = SNpder* BX, tes 


where Br is the symbol rate, Bw the receive filter 
bandwidth, and SNpd-, the signal-to-noise ratio 
of the CR phase detector output. This relation- 
ship is based on the work of F. M. Gardner”. 
SNpd-, depends on both SNRi and on the order 
of the phase detector. During the CR portion, the 
CR has second-order nonlinearity because BPSK 
demodulation is used, and SNpdcr is given by 


SNRi SNRi - 
siecle tea I 
2-SNRi 


where Rb(SNRi) is the loss caused by nonline- 
arity’”. 

We have already noted that phase detector 
nonlinearity degrades SNicr {see Equations (1) 
and (2)}. If there is no nonlinearity {Rb(SNRi) = 
1}, the type 1 filter cannot improve SNlcr, and 
SNlor is given by 
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—O— : Bw= Bi/8 
—O— = Bw= Bi/16 
—O— : Bw= Bi/32 


Ger (dB) 


SN&Ri (dB) 


Fig. 3— Rate of increase of SNler. 


SNRi-Bw _ SNRi’:Br _ Ksn 
Bl cr 2° Bl ce Bley 


SNler= 


SNRi’ = 2*Ep/No, —  —— sseeee (4) 


where SNRi’ is SNRi while using the type 3 filter 
(Bw = Br/2) with Ksn constant. Bw does not 
contribute to SNler. 

In practice, some second-order nonlinearity 
exists {see Equation (2)}, and SNlcr is given by 


Ksn/Bl oy Ksn/Bl o 


SMler= "Pe (SNRi) Rb(Ksn/Bu)’ (9) 


Bw contributes to SNlcy. Therefore the rate of 
increase of SNle, Ger, is given by 
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—e—  Bw= B/8 
—O— : Bw= B/16 
—O@— = Bw = B/32 


(dB) 


Gor 


SNkRi (dB) 


Fig. 4— Rate of increase of SNlstr. 


Rb(SNRi’) __{1+(2+SNRi)"'} ig 
—"Rb(SNRi) (1+ (SNRi"-Br/Buy'y “© 


Ger 


Figure 3 shows Ger. 
2) Passband filter (Bw << Br/2) for STR and 
kick-off circuits (type 2 filter) 

STR has almost the same second-order 
nonlinearity as CR. Moreover, the attenuation of 
the type 3 filter at DC is 3 dB less than at Br/2 
by virtue of the type 3 root-rolloff filter. 
Therefore, SNistr is given by 

SNlstr = 2°SNpdstr* Buw/Blstr 


2-Ksn/Bl sty 2°Ksn/Bl str 7 
="Rb(SNRi) ~~ Rb(Q-Ksn/Bu)) 


The rate of increase of SNlstr, Gstr, is given by 


2-Rb(SNRi’) =2«{1+ (2*SNRi’)~'} 
Rb(SNRi) — {1+ (SNRi’-Br/Bw)-'}’ 


Gstr Px 


SN. increases by more than 3 dB. Figure 4 
shows Geétr. 


4,2 Hardware configuration 

The DSP demodulator consists of DTF, 
STR, CR, and kick-off circuits (see Fig. 5). The 
DTF, CR and STR circuits are implemented in 
highly integrated chips. 
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Output 


Fig. 6— Digital transversal filter. 


1) Digital transversal filter (DTF) 

The DTF is a FIR filter consisting of an IC 
and two ROMs (see Fig. 6). The input signal rate 
is 4 samples/symbol and the output signal rate is 
2 samples/symbol. The coefficients are symme- 
trical, and 12 out of 25 have the same value 
because the phase/frequency characteristic is 
linear. The ROMs hold three addressable sets of 
filter coefficients. 

2) Symbol timing recovery (STR) 

The frequency stability of the clock source 
oscillator must be very high for satellite 
communications. Since the clock frequency error 
is negligible, the STR need only recover the 
clock phase. The STR is a digital PLL 
controlling the dividing ratio N of a digital VCO 
(see Fig. 7). The STR consists of an IC and an 
external clock running at 64 times the symbol 
rate. The loop filter generates a control signal 
for the VCO when the accumulated clock phase 
error exceeds the +K,,,, threshold. The threshold 
is varied during the STR portion and, for 
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Input L . 
Sh anacdakan oop filter 
—— Phase detector (threshold + Kru) 


ume ee 


Output 


t= Brx 64 


‘ Digital VCO 
Kick-off signal 


Fig. 7—Symbol timing recovery. 


BPSK phase 
detector 

QPSK phase 
detector 


Input / 


Complex aa 
multiplier 


Input @ 


Loop 
filter 


Numerically 
controlled 
oscillator 


Fig. 8— Carrier recovery. 


Input 


Fig. 9—CR loop filter. 


initial acquisition, the VCO output is exclu- 
sive-ORed with the kick-off signal and inverted. 
3) Carrier recovery (CR) 

The carrier recovery IC uses Costas PLL 
(see Fig. 8). Its loop filter is shown in Fig. 9. The 
noise bandwidth is a function of x and y, both of 
which are varied during the CR portion of the 
burst. The phase detector is switchable between 
BPSK and QPSK: BPSK is used during the 
preamble portion, QPSK during the data portion. 


4.3 Performance 

Figure 10 shows the Piss performance of the 
demodulator at Bwe, (the type 1 filter bandwidth) 
= Br/12, Bwety (the type 2 filter bandwidth) = 
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: Ideal 
: Type 3 


: Type 1,3 
: Type 1, 2,3 


Priss 


0 1 2 3 4 5 
E\,/ No (dB) 


Fig. 10—Improvements in unique word miss proba- 
bility. 


Br/10, Br = 480k symbols/s, Buc, (the CR upper 
bound of the noise bandwidth) = 1 kHz, and 
Bus (the STR upper bound of the noise 
bandwidth) = 17 kHz. Both Bu., and Bus are 
given by Equation (9). These Bu values mean 
that P., should be less than 10-® at an E/N, of 
0 dB™. 


Pos — exp (-1 * SNL) * Bu/Bre teens (9) 


A conventional demodulator with a type 3 
filter only has hang-up problems because Priss 
often jumps in value. A DSP demodulator with 
type 1 and type 38 sequentially changing filters 
eliminates hang-up and decreases Pyniss to about 
5 % at an E/N, of 3 dB. A DSP demodulator 
sequentially changing all three filter types 
decreases Pyiss to about 0.5 % (see Fig. 10). 
Measured data meets the requirements given in 
Chap. 2. 


5. Application to the VSAT system for city and 
prefectural governments 
A VSAT system for city and prefectural 
governments which uses our DSP demodulator is 
shown in Fig. 11. Its system parameters are 
listed in Table 2. Figure 12 shows the DSP 
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Table 2. 


System parameters 


Voice Broadcast Channel control 
facsimile | Inbound Outbound Inbound Outbound 
Satellite DA-FDMA PA-TDMA PA-TDMA RA-TDMA PA-TDMA 
: QPSK QPSK QPSK QPSK QPSK 
Modulation : : 
burst burst continuous burst continuous 
Symbol rate 35k 35k 35k 35k 35k 
wenrieos a 32kb/s |  32kb/s 32 kb/s 32 kb/s 32 kb/s 
E ti Viterbi Viterbi Viterbi Viterbi Viterbi 
rror correction hes es ee h=7 be 7 


Satellite 


OS Sus 


Hub station VSAT stations 


Fig. 11— VSAT system. 


demodulator. 

The purpose of this VSAT system is to 
reliably transmit administrative information and 
gather local information at any time, even in a 
disaster. 

One of the system’s features is a voice 
activation technique” in the voice channel to 
enable efficient use of the satellite’s power. 
Voice data is transferred as burst data, even if 
the satellite uses frequency division multiple 
access (FDMA). This system requires higher 
stability burst demodulators than is needed by 
other VSAT systems for low E/N, operation. 


6. Conclusion 

Achieving a low unique word miss probabili- 
ty (Pmiss) at a low cycle-skip rate is a significant 
task in all satellite communications systems. We 
propose an effective solution to this problem in a 
technique which improves the initial acquisition 
of both the CR and the STR circuits. The 
method increases the S/N of the CR loop by 
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10cm 


Fig. 12— DSP demodulator. 


about 1 dB and that of the STR by at least 3 dB. 
This means that the unique word miss probabili- 
ty of our demodulator is at least 100 times less 
than that of a conventional demodulator at an 
Ey/N, of 3 dB. The DSP demodulator enables 
VSAT systems to employ advanced FEC and 
voice activation techniques. 
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Background impurities (carbon and oxygen) incorporation and the n-type doping of 
AlGaAs by gas-source molecular beam epitaxy (GSMBE) using triethylaluminum 
(TEAI) and trimethylamine alane as aluminum sources have been extensively 
studied; triethylgallium as the gallium source; cracked arsine as the arsenic source; 
and uncracked disilane as an n-type dopant source. A carbon-doped base (p = 4 xX 
10'° cm~*, 92.5-nm thick, trimethylgallium as the carbon source) GaAs/AlGaAs 
heterojunction bipolar transistor was grown (TEAI as the aluminum source). The de 
current gain of 45 was obtained at a current density of 4 x 10‘ A/cm? (4 xX 5 um? 
emitter). Device characteristics under current stress were found to be stable. 


1. Introduction 
Gas-source molecular beam _ epitaxy 
(GSMBE) using metalorganic group III and 
hydride group V sources has been used to grow 
HI-V compound semiconductors. In particular, 
the ability to grow heavily carbon-doped GaAs 
(1 x 10° cm~%) using trimethylgallium ((CH;), 
Ga, TMGa)’ is one of the most attractive fea- 
tures of GSMBE for device applications. Carbon 
at high concentrations in GaAs has a low 
diffusion coefficient compared to other p-type 
dopants such as beryllium and zinc”. GSMBE is 
therefore promising in the growth of heavily 
carbon-doped heterojunction devices such as 
GaAs/AlGaAs heterojunction bipolar transistors 
(HBTs): ”. The ability to reproducibly control 
the n-type doping of AlGaAs has been a major 
concern in the practical application of GSMBE 
for the growth of HBTs. Major problems in the 
early stages of n-type doping of AlGaAs were: 

1) the high background carbon concentration in 
AlGaAs grown using _ triethylaluminum 
((C,H;),Al, TEAL), triethylgallium ((C,H;), 
Ga, TEGa), and arsine (AsH;) as source 
gases” and 

2) poor doping reproducibility using conven- 
tional hot solid dopant sources such as 
silicon and tin”. 
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The background carbon concentration in 
AlGaAs grown by GSMBE using the source 
gases mentioned above was reduced to the order 
of 10" cm~* by optimizing growth parameters 
such as substrate temperature and the AsH, 
flow rate”. We reported that disilane (Si,H,) 
is a promising n-type cold gaseous dopant source 
for the GSMBE_ growth of n-AlGaAs”:®, 
and demonstrated the first growth of a 
GaAs/AlGaAs HBT by GSMBE using only 
gaseous sources’. Recently, the background 
carbon concentration was further reduced to 
the order of 10" cm-* by using new alumi- 
num precursors such as trimethylamine alane 
((CH;),NAIH,, TMAAI)”:'” and tri-isobutyl- 
aluminum ((C,H,),Al, TIBAI)"” ”, which have 
led to excellent n-type doping controllability in 
practical application’ '” 

In this paper, we describe our recent 
GSMBE study on the incorporation of back- 
ground impurities of carbon and oxygen into 
AlGaAs which affects doping characteristics and 
the n-type doping of AlGaAs, using TEAI and 
TMAAI as aluminum sources, TEGa as the 
gallium source, cracked AsH, as the arsenic 
source, and uncracked Si,H, as an n-type 
dopant source. We also report on the p-type 
doping of GaAs using TMGa as both the dopant 
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and the host material source. We finally report 
on the growth of carbon-doped base GaAs/ 
AlGaAs HBTs using TEAI as the aluminum 
source, and discuss device characteristics and the 
stability against current stress”. 


2. Experiments 

AlGaAs epilayers were grown on (100)- 
oriented semi-insulating GaAs substrates in a 
VG80H MBE growth chamber using a specially 
designed gas handling system’. All metalorganic 
sources were introduced into the growth 
chamber without the use of a carrier gas. The 
flow rates of the metalorganics were controlled 
by a differential pressure control method. 
Hundred percent AsH, was used as a group 
V source and was cracked at 1100 ‘C in 
a low-pressure cracking cell. Si,H,, diluted to 
10% in H,, was used without precracking. 
The growth rate was 0.74 to 1.24 uwm/h for 
Al,Ga,_,As (« = 0 to 0.40). The chamber pres- 
sure during growth was 5X 10°° to 15 107‘ 
torr. The carrier concentration of n-AlGaAs 
epilayers was determined by conventional C-V 
measurements. Atomic impurity concentrations 
were evaluated by secondary ion mass spectro- 
scopy (SIMS) using Cs* primary ions. Calibration 
was done by comparison with silicon, oxygen, 
and carbon impurity-implanted MBE-grown 
GaAs and Al,,Ga,7As as references. 


3. Results and discussion 
3.1 Dependence of impurity incorporation in 

AlGaAs on growth condition 

We studied the incorporation of carbon and 
oxygen background impurities in AlGaAs by 
varing the grouth conditions. Figure 1 shows the 
variation of the background carbon concent- 
ration with substrate temperature, 7T,,, and the 
AsH, flow rate for undoped Al,,Ga,,;As grown 
using TEAI (circles) and TMAAI (squares). The 
carbon concentration shows a weak V-shaped 
dependence on the substrate temperature when 
TEAI is used as the aluminum source at an 
AsH, flow rate of 2 sccm (closed circles). A 
minimum carbon concentration of 1.5 x 10" 
cm~* was obtained at 610 °C. This substrate tem- 
perature dependence resembles that reported for 
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Fig. 1— The variation of the background carbon 
concentration with Tsub and AsH, flow 
rate for undoped AlGaAs grown using 
TEAI and TMAAI. 


GaAs, although AlGaAs shows a_ weaker 
variation. The carbon concentration decrease 
with increasing T5,, up to 610 °C is most likely 
due to the reduced number of incompletely 
decomposed metalorganic molecules {probably 

Ga(C.H;) or Al(C,H;)}. Part of the third ethyl 

radical of TEGa or TEA] remains undecomposed 

on the substrate surface due to the low substrate 
temperature, leading to carbon incorporation. 

This model was used to explain carbon incor- 

poration in GaAs epilayers when trimethyl- 

gallium ((CH,),Ga, TMGa) is used as a gallium 
source’ '”. In the higher temperature range, the 
increased carbon concentration may be due to: 

1) an increased rate of pyrolysis of the C-C 
bond in monoethylgallium or monoethyl- 
aluminum, which remain adsorbed on the 
substrate surface (@-methy! elimination reac- 
tion), and/or 

2) the thermal decomposition of ethylradicals, 
which are by-products of TEA] or TEGa"™. 
In the case of 1), the increased C-C bond 

dissociation {M (C,H,) —— M (CH,) + CH,; 

M = Ga, Al} leaves molecules like M (CH,) on 

the substrate surface. The CH, radical part of 

Ga(CH,) or Al(CH,) adsorbed on the substrate 

surface then proceeds to occupy an arsenic site, 
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leading to carbon incorporation in the AlGaAs 
epilayer. 

At a low Ts of 520 °C, carbon incorpora- 
tion decreases notably with an increasing AsH, 
flow rate. A carbon concentration of 6 X 10" 
* at 2 sccm was reduced to 1 X 10" cm~* at 
7 sccm. Above 610 °C, however, the carbon 
concentration was almost invariant for all AsH, 
flow rates, suggesting that the hydrogen from 
cracked AsH, (molecules and/or radicals) may 
react with the monoethylgallium or monoethyl- 
aluminum metal-carbon bond and enhance the 
dissociation of the third ethylradical bond, 
leaving metal atoms. 

Carbon concentrations for TMAAI, are 
almost independent of substrate temperatures in 


cm” 


the range studied. A minimum carbon concent- 
ration of 7 X 10" cm~* was obtained at T ay 
between 520 and 580 °C and an AsH, flow rate of 
4 sccm. The carbon concentration in AlGaAs 
using TMAAI decreased over an order of 
magnitude less than that using TEAI. The low 
carbon concentration over the wide substrate 
temperature range has the advantage of enabling 
us to choose the optimum temperature for 
practical device structures. Trimethylamine 
(TMA) appears easily removed from the Al 
atom and the C-N bond is thermodynamically 
more stable than the C-C bond. Decreasing the 
AsH, flow rate to 2 sccm increased the carbon 
concentration, suggesting that the carbon con- 
centration could be effectively reduced by 
increasing the AsH, flow rate. 

Figure 2 shows the dependence of the 
oxygen concentration on substrate temperature 
for undoped Al,Ga,_,As grown using TMAAI 
(open squares, x = 0.34, ASH, = 4 sccm) and 
TEAI (closed circle, x = 0.8, ASH; = 4 sccm), 
together with previously published data by 
GSMBE using the same combination of source 
gases (closed squares, TMAAI, TEGa and AsH,, 
x = 0.42, ASH, = 5 sccm)". Present data using 
TMAAI shows a constant oxygen concentration 
of 4 X 10" cm~$, the same as that using TEAI, 
and about an order of magnitude lower than 
previously reported data’. The oxygen in our 
experiments was supposed to come from the gas 
supply line, not from TEA] and TEGa as 
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Fig. 2—The dependence of the oxygen concent- 
ration on Tsub for undoped AlGaAs. 
Abernathy et al. reported for AlGaAs 
grown by using the same combination of 
source gases (TMAAI, TEGa and AsH;)'. 
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Fig. 3— The variation of oxygen and carbon con- 
centration in n-AlGaAs with the AlAs 
mole fraction. 


described in the previous report””. 

Figure 3 shows the variation of carbon 
concentration (open squares) and oxygen 
concentration (open circles) in  Si-doped 
n-AlGaAs using TMAAI with the AlAs mole 
fraction. Epilayers were grown at a substrate 
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temperature of 580 °C and an AsH, flow rate of 
4 sccm. Data using TEA] as the Al source 
(closed circle) is also plotted for comparison. 
The oxygen concentration was between 3 x 10" 
and 4 X 10" cm~* over the entire AlAs mole 
fraction range investigated, which is the same as 
that using TEAI. 

The carbon concentration increases grad- 
ually with the increasing AlAs mole fraction. 
This is seen in GSMBE-grown AlGaAs using 
TEAI or TIBAI as an Al source gas''” '”’, but the 
carbon concentration shown in Fig. 3 is over an 
order of magnitude less than that using TEA] 
(10 em). 

Oxygen and carbon concentrations in 
n-GaAs grown under the same conditions were 
below the SIMS detection limit, that is, less than 
5 X 10 cm~* for oxygen and less than 2 x 10" 
cm * for carbon. 


3.2 Doping characteristics of n-AlGaAs using 

Si,H, 

The dependence of carrier concentration of 
n-AlGaAs on Si,H, flow rate, Si incorporation 
efficiency into AlGaAs, and the electrical 
activation efficiency of n-AlGaAs were studied. 
Figures 4a) and b) show the variation of the 
carrier concentration with diluted Si,H, flow 
rate of n-Al,Ga,_,As (x = 0-0.40) grown using 
TEA! and TMAAI. The substrate temperature 
was 580 °C and the AsH, flow rate was 4 sccm. 
The carrier concentration plotted is normalized 
to a growth rate of 1 44m/h. Results showed that 
GaAs can be controllably doped from 1 = 10" to 
3 xX 10 cm~*% at normal growth temperatures. 
Using TEAI as shown in Fig. 4a), it can be seen 
that the carrier concentration of an Al,Ga,_,As 
(x = 0.238-0.28) epilayer is always much less than 
the corresponding GaAs epilayer for the same 
Si,H, flow rate. This difference is attributable 
to the larger background carbon concentration 
in AlGaAs than in GaAs. 

Our results also show that Si,H, dissociates 
much more readily than SiH,, of which pyrolysis 
was enhanced by precracking the SiH, for a 
tantalum filament to obtain electron concent- 
rations above 10"cm~* *", 

In contrast, with TMAAI of Fig. 4b), the 
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Fig. 4— The variation of the carrier concentration 
with Si,H, flow rate of n-AlGaAs grown 
using TEA] and TMAAI. 


carrier concentration of n-Al,Ga,_,As (« = 
0.09-0.27) was reproducibly controlled between 
5X 10" and 3 X 10 cm~* by varying the Si,H, 
flow rate from 1 to 8 sccm. The carrier 
concentration of AlGaAs showed the same Si,H, 
flow rate dependence as that of GaAs at the 
aluminum composition of 0-0.27. That is because 
the use of TMAAI significantly reduced the 
concentration of carbon acceptors, as shown in 
Figs. 1 and 3. The carrier concentration decreas- 
ed when aluminum composition increased from 
0.27 to 0.40 at a constant Si,H, flow rate of 1 
sccm. However, compensation by carbon 
acceptors is thought to affect the aluminum 
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Fig. 5— The variation in Si atomic concentration 
with the Si,H, flow rate for n-AlGaAs 
using TEA] and TMAAI. 


composition dependence of the carrier concent- 
ration negligibly, because the hole concentration 
originating from carbon acceptors is low (10" 
cm~*) even at x = 0.4. The carrier concentration 
decrease for x greater than 0.27 is discussed 
later. 

To investigate the depedence of Si atomic 
concentration of n-AlGaAs on the Si,H, flow 
rate, SIMS measurements were performed for 
the same samples in Figs. 4a) and b). Figure 5 
shows variations in Si atomic concentration with 
the Si,H, flow rate for n-Al,Ga,_,As (x = 
0-0.4) grown at a Ty,» of 580 °C and an AsH, 
flow rate of 4 sccm using TEAI (closed symbols) 
and TMAAI (open symbols). The Si atomic 
concentration plotted is normalized to a growth 
rate of 1 «mm/h. The Si atomic concentration in 
the GaAs epilayer (open circles) is almost 
proportional to the Si,H, flow rate. The Si 
incorporation efficiency, /;, was estimated to be 
1.9 < 10~‘, using the relationship: 


Si_atomic concentration _ 
Ga atomic concentration 


Ej Xx (Siz He flow rate x 10% x 2) 
Calculated TEG flow rate : 
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The corresponding EF; for AlGaAs epilayers 
with x~0.1 is the same as that of GaAs, but E; 
decreases from 1.9 * 10-‘ to 1.38 X 10-* when the 
Al content was increased from 0 to 0.3. The 
incorporation of Si into Al,Ga,_,As (x > 0.2) 
decreases with an increasing AlAs mole fraction. 
This was observed for TEA] and TMAAI. The 
decreased carrier concentration of n-Al,Ga,_,As 
(x = 0.27-0.40) in Fig. 4b) is probably due to de- 
creased Si atomic concentration. However, this 
behavior of Si incorporation into AlGaAs differs 
from that of low-pressure (LP) MOVPE (78 torr) 
in which the Si atomic concentration does not 
depend on the Al content of epilayers”. This 
may be due differences in pressure during 
growth. That is, in LP-MOVPE, Si,H, thermal 
decomposition proceeds before Si,H, molecules 
reach the growth surface. In GSMBE (107! to 
10-® torr), Si incorporation is limited by the 
decomposition of Si,H, only at the growth 
surface, and so decomposition is sensitive to the 
AlAs mole fraction of the growth surface layer. 

The dependence of the carrier concentration 
of n-Al,Ga,_,As (x = 0-0.40) on the Si atomic 
concentration grown at a Ts) of 580 °C and an 
AsH, flow rate of 4 scem using TEAI (closed 
symbols) and TMAAI (open symbols) was shown 
in Figs. 6a) and b) by using data of Figs. 4a), b) 
and 5. Data for n-GaAs and n-Al,Ga,_,As (x = 
0.2-0.3) grown by conventional MBE are also 
plotted for comparison (shaded symbols). The 
carrier concentration for MBE-grown epilayers 
is clearly proportional to the Si atomic concent- 
ration, and all incorporated Si atoms are 
electrically active as donors; that is, the 
activation efficiency is one. The carrier concent- 
ration of GSMBE-grown GaAs from 1.5 =< 10" 
cm~* to 3 X 10% em~’ (open circles) is almost 
proportional to the Si atomic concentration and 
more than 60 % of the Si atoms in GaAs are 
electrically activated. In Fig. 6a), the carrier 
concentration drop of AlGaAs below 10" cm is 
mainly due to carbon acceptor compensation. 
The reason for low activation efficiency in the 
carrier concentration region greater than 10" 
cm * is not clear. 

In Fig. 6b), note that the carrier concent- 
ration of AlGaAs grown using TMAAI shows 
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Fig. 6—The dependence of the carrier concent- 
ration of n-AlGaAs (x=0-0.4) on the Si 
atomic concentration using TEA] and 
TMAAI. 


the same linear dependence as GaAs, quite 
different from that of AlGaAs using TEAI in 
which the carrier concentration is always 
smaller than that of GaAs at the same Si atomic 
concentration as shown in Fig. 6a). The 
activation efficiency of Si is more than 60 % for 
AlGaAs grown using TMAAI and_ most 
incorporated Si atoms seems to be activated as 
shallow donors even at concentrations up to 3 
10° cm~*. These results indicate that the doping 
controllability of n-AlGaAs is_ significantly 
improved using TMAAI, particularly in the case 
of lower carrier concentrations and higher AlAs 
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Fig. 7— The variation of GaAs hole concentration 
with substrate temperature and V/TIl ratio. 


mole fractions. 


3.3 p-Type doping of GaAs using TMGa 

TMGa is a promising p-type dopant source 
for the GSMBE growth of GaAs’. We studied 
the p-type doping of GaAs using TMGa as both 
the dopant and the host material source and 
AsH, as the arsenic source. Figures 7a) and b) 
show the variation of the p-type GaAs epilayer 
hole concentration and mobility with substrate 
temperature and V/III ratio, respectively. All the 
p-type epilayer had excellent murror surface 
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morphologies for all growth conditions studied. 
The epilayer mobilities varied from 100 cm?/Vs 
at a hole concentration of 6.3 < 10'* cm~* to 60.5 
cm*/Vs at a hole concentration of 1.3 x 10” 
cm~*. The GaAs hole concentration is seen to be 
relatively insensitive to changes in the substrate 
temperature between 550-610 °C, However, the 
hole concentration is strongly dependent on the 
V/III rario as shown in Fig. 7b). The epilayer 
mobilities are considered to be comparable or 
better than p-type GaAs epilayers having similar 
doping levels grown by other growth methods. 
The strong V/II ratio dependence and relative 
independence with changes in growth tempera- 
ture under these growth conditions, implies that 
the incorporation of the acceptor impurities 
(carbon) is limited by the reaction of the TMGa 
with AsH, by-products, and not only pyroliti- 
cally limited as was reported in the case of 
TMGa-As,”. Our results show that the use of 
TMGa-AsH, the combination allows the easy 
and reproducible control of the epilayer hole 
concentration by simply adjusting the TMGa 
and AsH, flow rates, which is simpler and more 
accurate than the adjustment of substrate 
temperature. Thus, by the appropriate choice of 
growth conditions, the GaAs epilayer hole 
concentration can be easily varied between 6.3 * 
10" and 1.3 < 10” cm-°. 


3.4 Growth and characterization of HBT 

structure 

We grew a carbon-doped base GaAs/ 
AlGaAs HBT using TEAI] as an aluminum 
source. The HBT structure is shown in Fig. 8. 
The entire structure was grown at a substrate 
temperature of 580 °C with optimization of 
least-carbon compensation in the AlGaAs emit- 
ter layer. The Al,,Gay,As emitter layer doped 
with silicon from Si,H, to a carrier concent- 
ration of 9 X 10" cm~* has an abrupt junction. A 
GaAs base layer was doped with carbon to a 
carrier concentration of 4 x 10" cm ™* using 
TMGa as both the Ga source and the carbon 
source. The base layer is 92.5 nm thick. A 7.5-nm 
thick undoped GaAs spacer layer was grown 
between the emitter and base layers. The AsH, 
flow rate is kept constant at 4 sccm except for 
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Fig. 8— Carbon doped base HBT structure. 
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Fig. 9— SIMS depth profile for a HBT structure. 


the base layer, for which the flow rate is reduced 
to 3 scem to increase the carbon incorporation 
from TMGa. The carbon concentration is well 
controlled by varying an AsH, flow rate as 
previously reported”. Detailed growth conditions 
have been described in another paper”. 
Conventional wet chemical etching and lift-off 
were used to fabricate HBTs. A CVD-SiO, film 
was deposited for surface passivation. The 
emitter and collector ohmic contact metals were 
AuGe/Au, alloyed at 400 “Cin N, gas. The base 
ohmic contact metal was nonalloyed Cr/Au. 
Figure 9 shows a SIMS depth-profile of the 
HBT structure. Carbon, silicon, and matrix 
elements were measured. The atomic carbon 
concentration in the base layer is almost the 
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Fig. 10-—Common-emitter | - \Y characteristics of 
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Fig. 11—Gummel plots for a carbon doped base 
HBT. 


same as the carrier concentration of 4 X 10" 


cm? 


. The carbon profile shows a sharp drop at 
the emitter-base interface within the depth 
resolution of the SIMS, in this case, about 7.5 
nm. This suggests that a well-defined interface 
was obtained using Si from disilane as an 
emitter dopant source and carbon from TMGa 
as a base dopant source. 

Figure 10 shows the common-emitter I-V 
characteristics of the HBT having an emitter of 
4 x 54m? A de current gain of 45 was obtained 


at a collector current density of 4 10’ A/cm? 
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and an emitter-collector voltage of 2.5 V, 
suitable for practical IC fabrication. The 
transistor had a turn-on voltage of about 0.2 V. 

Figure 11 shows the Gummel plot of the 
transistor. The base current ideality factor was 
determined to be 1.47, suggesting that the 
generation-recombination current at the emitter- 
base junction is very small, and the emitter-base 
interface is well-defined. The ideality factor for 
the emitter-base diode was also measured to be 
1.12, indicating high-quality GSMBE growth at 
the emitter-base junction. The superior quality of 
our HBT is seen when comparing the ideality 
factors obtained with Tin of 1.4” or TESn of 
1.31° as emitter dopant sources. This advan-tage 
may be ascribed to the use of Si from Si,H, as 
an emitter dopant source, which is a well- 
behaved impurity in AlGaAs. 

We studied the electrical stability of a 
carbon-doped-base HBT under current stress"”. 
The Be-doped-base HBT grown by conventional 
MBE reportedly shows a large device degra- 
dation due to Be diffusion from the base to the 
emitter” *’. Carbon is well known to be very 
stable under thermal stress, but, its electrical 
stability under current stress is not so clear. 
Current stability was measured at room tem- 
perature under the common-base configuration 
to keep the emitter current constant at 10 mA. In 
the Gummel plot, the current shift toward higher 
base-emitter voltage often observed for such 
samples is attributed to the drift of the junction 
toward the wide-gap AlGaAs layer due to Be 
diffusion. In contrast, a GSMBE-grown carbon- 
doped base HBT with an emitter size of 5 X 5 
{tm?® shows no change in the Gummel plot even 
after 10 hours of current stress. This indicates 
the absence of base dopant diffusion due to 
electric stress. 

The collector current, /,, and base current, 
I,, of the carbon-doped-base HBT, were meas- 
ured as a function of stress time with the emitter 
current kept constant at 10 mA, which is the 
operation current commonly used for practical 
applications. The variation of /./J, with current 
stress time, which expresses the variation in 
gain, is shown in Fig. 12. The gain was almost 
constant over the investigated duration of up to 
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potential of GSMBE for HBT applications. 
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Fig. 12—Variation of gain ([¢/Iy) with electrical 
stress time. 


10 hours, suggesting no significant degradation 
GSMBE-grown 
carbon-doped base HBT. These results suggest 
that carbon is stable under thermal stress during 
growth and under current stress. 


of device performance for 


4. Conclusion 

We compared the incorporation of oxygen 
and carbon background impurities into AlGaAs 
and the n-type doping of AlGaAs with uncracked 
Si,H, as an n-type dopant source, using TEA] 
and TMAAI as aluminum sources for GSMBE 
using only gaseous sources. It was shown that 
carrier concentrations in n-AlGaAs can be well 
controlled around 10" cm ~* using TEA] and 
from 2 X 10" cm™* to 3 X 10% cm™* using 
TMAAI. The GaAs epilayer hole concentration 
could be easily controlled between 6.3 < 10" and 
1.3 < 10” cm~* by varying V/III ratio. We grew 
a carbon-doped base (p = 4 X 10" em~’, 92.5-nm 
thick, TMGa as _ the 
GaAs/AlGaAs HBT having a silicon doped 
emitter layer (n = 9 X 10" cm~’, TEAI as the 
The emitter-base ideality 
factor was 1.12, indicating that an excellent 


carbon — source) 


aluminum source). 


junction was formed using silicon and carbon. 
The de current gain was 45 at a current density 
of 4 x 10’ A/cm? (4 X 5 £m? emitter). Device 
characteristics under current stress was found to 
be stable. These results demonstrate the great 
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This paper describes Fujitsu’s research on an advanced service creation environment 
for intelligent networks. This paper introduces a service creation environment that 
enables rapid and easy development of new telecommunication services by 
personnel who are not experts in switching software. An application oriented service 
specification language based on a state transition model is proposed. An abstracted 
switching model and service independent processing components are used in this 
language to simplify specification description. This paper also discusses service 


development support functions (creation, 


verification, and translation of the 


specification) and outlines a prototype system. 


1. Introduction 

The use of advanced telecommunication 
services, for example, multimedia, intelligent 
control, and personalized communications, is 
expected to increase. These services are 
implemented by combining a telecommunication 
network with information processing functions 
by using computers connected to the network. 
Since service execution is controlled by 
software, the realization of these services 
involves a large amount of software design and 
development. Currently, service software is 
developed only by switching-software engineers 
of the telecommunications system vendors. 
However, for the rapid introduction of new 
services, carriers and subscribers (end users) 
must also be able to develop service software. 
To meet this requirement of “customer program- 
mability”, an improved service creation environ- 
ment (SCE) is necessary’. Recently, a network 
architecture called the intelligent network (IN) 
has been introduced” ‘, and workstations are 
improving at a remarkable rate. The SCE is 
implemented on the workstations and creates 
service software for the IN. 
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Chapter 2 of this paper looks at the basic 
reasons and technical requirements for providing 
customer programmability. Chapters 3 and 4 
describe an application oriented service specifi- 
cation language and service development support 
functions which enable flexible service develop- 
ment. Chapter 5 describes a prototype system 
based on the concepts and components described 
in the chapters above. 


2. Service creation environment (SCE) 
2.1 Intelligent network (IN) 

The target network architecture of this 
research is the intelligent network (IN). The IN 
architecture makes it easier to develop service 
processing programs. Also, because service 
from basic call 
processing, the IN architecture is expected to 
simplify and speed up the development of 


processing is separated 


services. Figure 1 shows the architecture of the 
IN. It consists of three nodes: the service 
switching point (SSP), service control point 
(SCP), and service management system (SMS). 
The SSP performs the basic call processing that 
provides the ordinary switching functions. The 


FUJITSU Sci. Tech. J., 29, 2, pp. 180-188 (June 1993) 


J. Maeda et al.: Service Creation Environment Based on Application Oriented Specification Language 


Service execution 
platform 


SSP : Service switching point 

SCP : Service control point 

SMS: Service management system 
SCE : Service creation environment 


Fig. 1—IN architecture. 


SCP is a_ high performance, fault-tolerant 
computer that executes application service 
processing without affecting the basic call 
processing. The SSP and SCP communicate with 
each other via a standard message interface. 
The SMS manages information for service 
control. The creation, management, and 
operation of services are performed on the SMS. 
The services created by the SCE residing in SMS 


are downloaded to the SCP for execution. 


2.2 Objective 

Customer programmability is implemented 
in the data level, which is mainly used to define 
service parameters, and the logic level, which is 
the level used to create the service logic 
(program). Second level programming is difficult 
because it requires a knowledge of telecommu- 
nications and information processing. Although 
customer programmability in the data level has 
been developed, especially for free phone type 
services, it is still restricted. To enable the 
flexible developmemt of services, service logic 
creation is necessary. Also, when the situation 
involves a large amount of differing service 
requirements, service logic development should 
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Fig. 2—SCE configuration. 


be performed not only by switching software 
experts but also by non-experts ; for example, 
the carrier’s network operators and_ the 
computer-side’s systems engineers. The objective 
of the SCE discussed in this paper is customer 
programmability, including service logic creation 
by non-experts. 

Services are generally developed in the 
following stages: 
1) planning, 
2) preparation of the service specifications, 
3) creation of related programs and data, and 
4) installation of the program into the network. 
Verification is performed at each stage. 
Although the SCE supports all of the develop- 
ment stages, this paper focuses on the service 
specification creation phase because it is 
regarded as the key issue affecting customer 
programmability. 


2.3 Requirements 

Figure 2 outlines the SCE configuration. The 
SCE performs two main functions: service logic 
creation and data creation. 

The application oriented specification 
language is a key technology in service logic 
creation. Because users are non-expert, they 
cannot take into consideration the technical 


aspects of the service execution control logic. 
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Therefore, the user must be provided with a high 
level, easy-to-understand specification language. 
Users should be able to perform all service 
development tasks with only a knowledge of this 
language. It is also necessary to reduce the 
complexity of the service logic. 

The service development process should be 
improved by using a rapid prototyping method 
that reduces the time required to find errors and 
complete the development. The service specifica- 
tion is first defined by using the service 
specification language. Then, the specification is 
verified, and if an error is found, the process is 
repeated. When the specification is completed, it 
is translated into an executable program. 

To support this process, the following 
facilities are required: 

1) Support for the description of the service 
logic specifications, including automatic 
generation of the specification. 

2) Verification by testing at the service logic 
specification level. 

3) Automatic translation of the service logic 
specification into an executable program. 
Primitive data representations such as 

relational databases are not appropriate for 
non-expert users. Therefore, a higher level of 
data representation, that is, one that is concep- 
tually matched to the user’s view of IN service 
processing, should be provided for easy data 
creation. Such a representation must not be 
service-specific. Data is created by using data 
definition and data registration functions, and 
support facilities for both functions should be 
provided. 

Since the service creation environment is an 
interactive system, the quality of the human 
interface is an important factor. Therefore, 
advanced interface facilities, for example, 
graphic and sound interfaces, should be used to 
provide an intuitive and user-friendly interface. 
Because users prefer differing types of inter- 
faces, for example, some users prefer to enter 
data using an interactive method whereas others 
prefer to fill in a table, the users should be able 
to adapt the interface to suit their own 
preferences. 
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3. Service specification language 

From the requirements described above, a 
formal language based on the state transition 
model is a suitable choice for the service 
specification language. 

The behavior of the SSP can be regarded as 
a basic call model from the viewpoint of the 
SCP. This model is defined as a state transition 
representation. The service logic controls service 
execution along the basic call model. Therefore, 
the specification describes the service logic 
control flow based on the state transition. This 
description is well-known, easy-to-understand, 
and can be translated directly into an executable 
program. A formal language is needed for 
automatic translation”. 

The state transition model consists of the 
state and the state transition. The state repre- 
sents the state of the call model, which is an 
abstracted representation of network capability. 
The call model simplifies the description by 
hiding the details of switching. Communication 
resources such as terminals, connection states, 
and connection paths are represented as commu- 
nication subjects, connection points, or legs. To 
enable extension to a broadband ISDN, connec- 
tion points have connection type attributes that 
specify whether one-way, two-way, or multi-way 
connections are possible. Communication sub- 
jects connect multiple legs, making multimedia 
service descriptions possible. Figure 3 shows an 
example state of the call model. The figure 
shows a two-way voice path between three com- 
munication subjects, and a one-way video path 
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Fig. 4—Service components. 


between two of the communication subjects”. 

The state transition is represented by the 
service components, which are service independ- 
ent processing elements”. The service com- 
ponents must be easy for the user to understand 
and must indicate simple, intuitive elements of 
service behavior. 

Because of the need for modularity, the 
service components are based on an _ object- 
oriented concept. The objects consist of the call 
model, data, and related basic operations that 
are commonly used in many services (see Fig. 4). 
The service components are commands that 
invoke the objects. Some examples of these 
commands are connection and disconnection of 
legs, data authorization, and data translation. 

The language should be represented in a 
textual or graphical form that is similar to SDL 
(specification and description language)’. A 
graphical representation of the call model and 
service logic sequence makes them easier for the 
user to understand. 


4. Service development support functions 
4.1 Creating the specification 

In service development, after the target 
service feature has been determined, the designer 
creates the service specification. To simplify this 
creation stage, a specialized editor for the 
specification language should be available. An 
example of such an editor is a graphical editor 
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that manipulates graphical representations of the 
specification language by using the graphical 
user interface (GUI) functions of workstations. 
Also, automatic creation of the service logic 
specification described below is required. 

Because the formal service specification 
does not allow ambiguity, the service logic must 
be described in detail. This makes it difficult to 
write the specification. Moreover, in the case of 
multi-point, multi-connection services on a 
broadband ISDN, the complexity of the descrip- 
tion grows combinatively. Therefore, the user 
cannot always be expected to write a complete 
service logic specification. 

The specification consists of main and 
supplementary sequences, the user being mostly 
concerned with the main sequences. Supple- 
mentary sequences should be omitted because 
they are required in many parts of the specifica- 
tion, making it necessary to write many descrip- 
tions. To supplement the omitted specifications, 
a method of automatically generating them is 
needed. Because this is difficult to do algo- 
rithmically, knowledge processing technology 
should be used. The knowledge base contains 
information about how to make the service logic 
specification, and works as described below. 

First, the knowledge base finds the sequence 
branch. (Many branches occur due to events or 
the execution of service components, and 
branches that are not described in_ the 
specification are taken into consideration.) The 
knowledge base contains information that 
indicates the important branches; for example, 
an on-hook event is usually important, and 
therefore usually requires a supplementary 
sequence. 

Then, the knowledge base generates a 
processing plan. The states that the supple- 
mentary sequence should handle are determined 
from the cause of the supplementary sequence 
and the state of the call model. Because the 
processing usually results in one of several 
typical states, for example, path disconnection 
and error notification, it is feasible to determine 
the rules of the plan generation. 

Finally, based on the processing plan, the 
knowledge base generates a processing sequence 


183 


J. Maeda et al.: Service Creation Environment Based on Application Oriented Specification Language 


Service specification 


Lexical/syntactic verification 


Logical verification 


Semantic verification 


Verification by user 


Fig. 5— Methods of specification verification. 


from the service components which change the 
state. The knowledge base selects the service 
components based on the results of service 
component execution. Then, the order of the 
service components is decided based on the 
constraints imposed on the order. 


4.2 Verifying the specification 

The verification function corrects errors in 
the specification. The verification” method that 
is used varies with the type of error being 
corrected. The methods of specification verifica- 
tion are shown in Fig. 5 and are as follows: 
1) Lexical and syntactic 

Lexical and syntactic errors are detected by 
using ordinary lexical and syntactic analysis. 
These errors include misspellings of service 
component names, incorrect parameter values, 
and errors in the sequence of service com- 
ponents. 
2) Logical 

This method detects various contradictions 
in the specification, for example, events that 
cannot occur in the specified state and service 
components that cannot be executed in the 
specified state. One of the verification methods is 
to run the specification on a simulation and then 
check the results by using verification rules 
contained in the knowledge base. The rules 
describe constraints imposed on service execu- 
tion. The service execution can be regarded as a 
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finite state machine; therefore, every action is 
verified based on reachability tree analysis. 
3) Semantic 

This method detects incorrect service 
behavior, for example, providing the wrong 
announcement and connecting the wrong 
terminal. One method of detecting these errors is 
semantic analysis of the specification by using 
the knowledge base. The results of the service 
simulation are checked using semantic rules 
which represent general principles of service 
behavior. Constructing a knowledge base 
involves collecting examples of semantic errors 
that have been made in actual specifications and 
then extracting rules from them. However, it 
should be noted that because the definition of 
semantic errors is sometimes ambiguous (defini- 
tion depends on service designers), semantic 
analysis cannot always find every error. Another 
important method of detecting incorrect service 
behavior is the verification of the simulation 
output of service execution by user. To enable 
accurate evaluation of service use by using this 
method, it should be possible to run a simulation 
as if a real system was being operated. 


4,3 Translation 

The translation function translates the 
service logic program into a programming lan- 
guage, for example, C. An executable program is 
obtained by compiling this output. The service 
logic program consists of the message reception 
part and the service component execution part. 
The state of the call model and the event in the 
specification are mapped to the message recep- 
tion part, and are used to specify the condition 
that the SSP sends the next event to the SCP. 
The service component is replaced by library 
programs that are invoked after message recep- 
tion. The library programs include functional 
components (FCs) that are basic functions of the 
SCP platform. Some service components may 
have internal state transitions, in which case the 
service components are expanded into several 
parts. 
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5. Prototype system 
5.1 System configuration 

An SCE prototype system based on the 
concepts described above has been developed. 
The target switching system has a broadband 
ISDN switching capability based on ATM 
switching technology. Although the initial scope 
of this prototype system is limited to ordinary 
telephone services, an extension for broadband 
services is planned. The system consists of a 
Sun-4 workstation and a personal computer, 
which is used for sound generation. Figure 6 
shows the configuration of the prototype system. 
This system uses the X-Window graphical user 
interface. The editing, verification, and transla- 
tion programs are independent and access a 
common service specification database. These 
programs are written in C. 


5.2 Specification language 

The service logic specification language is 
represented in graphical form, and the specifica- 
tions are written by a graphical editor. An 
example graphical specification is shown in 
Fig. 7. The specification shows the processing 
flow of service execution control. It consists of 
the call model states, the events and the service 
components. Textual representation is also 
available. 

The state of the call model expresses the 
state of the telephone call, for example, 
Collecting Information, Analyzing Information, 
and Active (talking). Four terminal operations 
are provided as the events, for example, 
OFF-HOOK and ON-HOOK. Fifteen service 


components are provided. Examples of service 
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Fig. 7—Graphical specification example. 


components are shown in Table 1. The sequence 
can be branched according to the results of the 
service components. 

For example, in Fig. 7, a trigger indicating 
that special number “662” has been dialed from a 
terminal is received in the Collecting Informa- 
tion state, and then the service is activated. The 
service specification instructs the service to 
output an announcement to the terminal and to 
wait for additional information from the 
terminal by using the service components. In the 
next Collecting Information state, service execu- 
tion continues when the information is entered 
from the terminal. When the terminal is hooked 
on, an exceptional sequence is executed and the 
service is canceled. It is also possible to express 
a conference call by combining the state of the 
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Fig. 9— Verification example (2). 


call model and the service components. This is 
done by creating a new connection point for the 
third called party and merging it with the 
existing connection point. 


5.3 Service simulation 

A semantic verification tool that runs the 
service logic specification under a simulation 
was developed. The simulation demonstrates 
how the service is executed on the screen. Also, 
the simulation outputs various tones and 
announcements to terminals to explain the 
service execution. The simulation is operated 
directly by specifying terminal icons and 
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Fig. 10—Simulation mechanism. 


commands in menus such as OFF-HOOK, 
ON-HOOK, and DIAL. In this simulation, the 
service can be executed in steps. The user can 
easily find errors by using a function that shows 
the service execution trace on the specification. 
Some examples of verification screens are shown 
in Figs. 8 and 9. The connection state is shown 
in the switch (SSP) box and the executed service 
components are displayed in the SCP box. In 
Fig. 9 the trace of the execution route is 
displayed in a window on the right. 

Using this tool to repeat the verification and 
refinement stages of specification creation 
ensures sufficient verification in the semantic 
level, and enables services to be developed effi- 
ciently. Because large portions of the specifica- 
tions for new services are identical or similar to 
existing specifications, reuse of existing specifi- 
cations is encouraged. This makes this simula- 
tion even more useful because it can help the 
user to understand the behavior of an existing 
specification. 

The simulation mechanism is shown 
in Fig. 10. The simulators for the terminals, 
network resources, SSP, and SCP are independ- 
ent and communicate using messages. When the 
SCP simulator receives a trigger or event 
message from the SSP simulator, it executes the 
service specification by using an interpreter and 
sends control messages to the SSP simulator. 
Simulation control integrates the simulators by 
switch-ing the messages. 
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5.4 Translation 
The translation tool translates the service 

specification written in the service logic specifi- 
cation language into a service logic program. A 
service logic specification can be divided into 
blocks that start from a message waiting state 
and end at the next message waiting state. First, 
the translation tool generates the framework of 
the program that contains the variable declara- 
tion and the message receiving function. Then, 
the blocks are translated in the order in which 
they appear in the specification. The translation 
procedure for each block is as follows: 

1) The conditional branches which depend on 
the message contents are generated. 

2) The service components between a message 
waiting state and the next message waiting 
state are replaced with library programs 
that include FCs by applying the service 
component expansion rules. 


5.0 Results 

Two services were created on the prototype 
system so that the system could be evaluated. 
These two services were a simple universal 
personal telecommunications (UPT) service to 
represent a typical database-oriented service, 
and a conference service to represent a call 
processing-oriented service. The specifications 
for these two services were written using the 
specification editor, and the design was verified 
using the verification tool. The evaluation 
results indicated that the method worked well. 


6. Conclusion 

A service creation environment that pro- 
vides customer programmability improves the 
development of intelligent network services. 
This paper presented an architecture that 
enables non-expert designers of telecommunica- 
tion software to develop service logic. We 
propose an application oriented specification 
language which uses an abstracted call model 
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and service components. Service development is 
performed at the specification level by using 
support facilities to create and verify the 
specification, and then translate it into an 
executable program. Further developments 
leading to more sophisticated services in an 
advanced telecommunication environment are 
expected. 
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Neuro-Musician is an interactive music composer which plays a jam session with a 
human pianist, while composing improvisations using neural networks. It employs a 
new method of handling musical contexts to play several measures of a melody. A 
melody generation model is invented for this purpose. This model is based on neural 
networks, and utilizes the following information: melody contours, pitch and 
duration, and note-on timing. Three kinds of networks are used to implement the 
melody generation model. This paper presents a description of the model, its 


mechanism, and an evaluation. 


1. Introduction 

A computer with musical sense can help a 
person make music. Given a phrase, for example, 
such a computer composes appropriate example 
of phrases that follow or substitute for it. These 
examples are helpful for people wishing to 
compose or play music. 

Neuro-Musician is a computer with musical 
sense. It acquires its musical sense by learning a 
musical style using neural networks. Musical 
styles are difficult to describe with rules. One of 
the best ways for a computer to acquire a 
musical style is to give it actual music. We 
attempted to teach instances of music to neural 
networks. 


2. Neural network 

A neural network can be considered as a 
simple model of the human brain. The neural 
network we used is a three-layer hierarchical 
network that consists of an input layer, a hidden 
layer, and an output layer (see Fig. 1). Each 
layer has several neuron units. The network is 
taught pairs of input data and corresponding 
desired output data by the error backprop- 
agation method’. This teaching fixes the 
strength of connection between nodes in the 
network, which determine its behavior. The 
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network can then calculate an output for any 
input. For example, if the network is given an 
input which was used during teaching, it 
generates the output it was taught. As the neural 
network is taught pairs of inputs and desired 
outputs, it acquires generalized relationships 
between them. If the network is given a new 
input that was not used during teaching, it 
generates an appropriate output using these 
generalized relationships. 


3. History 

‘The “Neuro-Drummer” was our first 
research project. It ran from 1988 to 1989”. We 
attempted to teach a sense of rhythm to a neural 
network, because rhythm is one aspect of 
musical style. After the neural network had 
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Fig. 1— Neural network. 


189 


M. Nishijima, and K. Watanabe: Interactive Music Composer Based on Neural Networks 


learned about forty pairs of an input rhythm 
pattern and an output rhythm pattern, a 
professional drummer conceded that the Neuro- 
Drummer had improved greatly, and that it 
usually replied with interesting rhythms. 

The “Neuro-Musician” was our next re- 
search project”. During this project, we taught a 
sense of melody to neural networks. Learning 
this sense was more difficult, because it involved 
a lot of factors including pitch, duration, 
harmony, and rhythm, and because these factors 
influence each other. 

The Neuro-Musician takes the place of one 
musician in an ad-lib session in which two 
players take turns playing. The first player plays 
a piece of music for several measures, then the 
second player, the Neuro-Musician replies to it. 
Each player needs to accept and adapt to the 
other player’s musical style. Neuro-Musician’s 
replies cannot be random, and they must make 
musical sense and be artistically satisfying. 


4. Melody generation model 
We investigated how a professional jazz 

musician approaches a jazz ad-lib session, and 
what factors are crucial in playing a phrase 
eight or sixteen measures long. The musician we 
interviewed listed the following four major 
factors: 

1) Contour (outline) of the melody. 

2) Pitch change and rhythm (These are used to 
compose a melody that satisfies the con- 
tour). 

3) Note-on timing. 

4) Chord progression and available note scales. 
When the musician plays an ad-lib session, 

he or she considers all these factors together. 

For each piece of music, chord progression and 

available note scales can be _ determined. 

Relationships between the first three factors 

(contour, pitch and rhythm, and note-on timing) 

are determined when a musician plays phrases. 

The musician does not determine these relation- 

ships logically, but unconsciously. A musician 

will naturally play example phrases when he or 
she explains these relationships. 

We made a melody generation model based 
on the factors above and our discussion with the 
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jazz musician. Our melody generation model is 
an algorithm that lets a computer process music 
examples like a human musician would. We pay 
close attention to these three loose relationships: 
the relationship between the contour of a melody 
and the contour of the music immediately 
following the melody, the relationship between 
the contour of a melody and its detailed 
information such as pitch change and rhythm, 
and the relationship between the rhythm pattern 
and the note-on timing. Three factors are 
sampled as follows: 

1) Contour is described by the first, middle, and 
final notes in each measure using chord 
construction. In Fig. 2, first measure contour 
data consists of the root note, the fifth note, 
and the fifth note of D minor seventh (Dm7) 
instead of D, A, and Ad, respectively. 

2) Pitch means the difference between two 
consecutive notes. Rhythm is in units of a 
sixteenth of a note with strength (velocity). 
A number of notes and the distribution of 
pitch are also sampled from the melody. We 
call a pair of these two elements note 
density. 

3) Note-on timing is the sampled difference 
between punctual playing to a musical score 
and real playing in units of the MIDI 
(Musical Instrument Digital 
timing clock. 


Interface) 


5. Composing mechanism 

The composing mechanism consists of three 
procedures (see Fig. 3) : 
1) Make a contour (outline) of the output 
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b) Make a full score 


c) Adapt the rhythm 


Fig. 3— Composing mechanism. 
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melody. 

2) Make a full score by giving details to the 
contour. 

3) Adapt the rhythm for jazz playing. 

Three kinds of networks are used to implement 

our mechanism, and these networks cooperate to 

generate an output melody (see Fig. 4). 

In the first step, a contour of an output 
melody is generated using a contour of an input 
melody. This is because most human pianists 
tend to grasp the contour first when playing jazz 
ad-lib sessions. A network (contour generator) is 
taught pairs of an input melody contour and a 
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Fig. 5—System configuration. 


desired output melody contour. This network 
consists of 48 units in the input layer, 60 in the 
hidden layer, and 48 in the output layer. 

In the second step, a full score for eight 
measures is made using the note density. A 
network (pitch and rhythm generator) is taught 
pairs of contour, note density, pitch, and rhythm 
(duration and accent). The network consists of 
four sub-networks. Each sub-network generates 
two-measure melody data and has 24 units in the 
input layer, 40 in the hidden layer and 32 in the 
output layer. 

In the final step, swing is added to the 
rhythm to adapt it for jazz playing, which is 
essential to jazz. A network (note-on timing 
generator) is taught pairs of rhythm patterns and 
note-on timing. This network has 8 units in the 
input layer, 8 in the hidden layer, and 13 in the 
output layer. 


6. System configuration 

We ran this system on an FM R-70 32-bit 
personal computer and used MIDI to connect 
instruments to the computer. When a human 
musician plays an eight-measure melody, MIDI 
signals are generated and transmitted to the 
computer. The MIDI signals are converted to 
input data for the neural networks. The output 
from the neural networks is converted back into 
MIDI signals and these signals are transmitted 
to the sound generator to produce music (see 
Fig. 5). These MIDI signals are also transmitted 
to an FM TOWNS to synchronize computer 
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graphics on the FM TOWNS’s monitor with 
Neuro-Musician’s music. Prerecorded music for 
bass and drums are synchronized with the 
system. 


7. Experiment and evaluation 

We experimented by having Neuro-Musician 
play a jam session with a jazz pianist. To 
compose improvisations, the neural networks 
had to learn approximately 30 eight-measure 
patterns from the musician. This experiment was 
an eight-measure trade, and the theme was 
“Satin Doll” by Duke Ellington. When the human 
pianist plays eight measures of a melody, the 
Neuro-Musician replies with eight measures. 

In this experiment, we were able to get very 
exciting “jam sessions” between the jazz pianist 
and the Neuro-Musician. We determined that the 
output from neural networks can be effective in 
playing a jam session. The Neuro-Musician plays 
responses that the human musician likes. This 
experiment did reveal, however, an unexpected 
constraint on the music— unnatural replies came 
out, and due to several reasons, the computer did 
not generate human-like music. The Neuro- 
Musician generates arbitrarily long or fast 
phrases, either without rests or filled with 
sixteenth notes. This results in music that clearly 
would not be played by a human player. Another 
point is that the note-on timing patterns tend to 
be too simple. This kind of music rarely pleases 
or excites listeners. We must teach our neural 
networks additional human characteristics, such 
as limits on hand movement and _ breathing 
intervals. It is important, for example, to 
consider a relationship between the length of 
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phrases in the performance and_ breathing 
intervals. 


8. Conclusion 

An interactive music composer was created 
by investigating the musical factors involved in 
an ad-lib jazz session with a human musician. 
Three kinds of neural networks were able to 
generalize the relationships between musical 
factors by learning instances of music. In the 
future, we are going to refine our melody 
generation model by adding more human 
characteristics. 
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